Data Science and AI Quest: dbms

Showing posts with label dbms. Show all posts

Tuesday, March 30, 2021

Working with Data Objects in R

Working with Data Objects in R

* Data Objects are the fundamental items that one can work with in R language

* Carrying out analysis on one's given data and making sense of the results are the most important reasons of using R . In this article methods of working over a given data set would be provided and along with that understanding the results associated with the data objects of R is the central idea for working over R

* One can learn to use the different forms of data that are associated with R and how to convert the data from one form to another form

* Over R , one would also learn the techniques of sorting and rearranging the data

* Some of the processes involved with Data Objects manipulation in R are in following manner :

==========================

Manipulating Data Objects in R

==========================

- When you collect the data the very forst step is to get the data over to the R console

- After the data is imported over to the R console the very first step to perform over the data is to do summary statistics steps for finding out the requisite results from the data

- One may try to find more of some analytical results from the data from the given summary statistics of R as one may want to manipulate the data that is present over the R software .

- One may want to reorder the data into newer and more informative manner from the first version of the imported data otherwise one may also try to do a more informative manner of data reordering .

- Data Manipulation exercises of the given manner could also be performed over the data like extraction of certain parts of complex form of data etc could also be done from the data .

- As given there are multiple ways of manipulation of the data and thus understanding how to do this is the most important aspect of learning about R because the more one knows about the working of R language and the way R handles the objects , it is better to make use of R as an analytical tool

Set-Difference Operations in RDBMS - an explanatory blog

Set-Difference Operations in RDBMS

* The Set-Difference operation , denoted by "-" allows us to find the tuples that are in one relation but not in another .

* The expression , ( r - s ) produces a relation containing those tuples in r but not in s .

* One can find all the customers of the bank who have an account but not a loan by writing in the below manner :

* In the case of the Union Operation , one must ensure that the set differences are taken between compatible relations .

* For a set-difference operation where "r-s" to be valid , one requires that the relations r and s be of the same arity , and that the domains of the ith attribute of r and the ith attribute of s should be the same .

Monday, March 15, 2021

Types of Variables in Python ( related to Classes and Objects in OOPS )

Types of Variables in Python

* The variables which are written inside a class are of primarily two types :

1) Instance Variables

2) Class Variables / Static Variables

* Instance variables are the variables whose separate copy is created in every instance ( or object )

* For example , if 'x' is an instance variable and if we create 2 instance variables then there will be 2 copies of 'x' in the instances .

* When we modify the copy of 'x' in any instance, then it will not modify the other two copies .

==========================================
Python Program to understand Instance Variables
==========================================

# instance variable example

Class Sample:

# this is a constructor

def __init__(self):

self.x = 10

# this is an instance method

def modify(self):

self.x += 1

# Create 2 instances

s1 = Sample()

s2 = Sample()

print(' x in s1 = ', s1.x)

print(' x in s2 = ', s2.x)

# modify x in s1

s1.modify()

print(' x in s1 = ' , s1.x)

print(' x in s2 = ' , s2.x)

# modify x in s1

s1.modify()

print(' x in s1 = ', s1.x)

print(' x in s2 = ', s2.x)

Output

x in s1 = 10

x in s2 = 10

x in s1 = 11

x in s2 = 10

================================================================

* Instance Variables are defined and initialised using a constructor with 'self' parameter . In the above program it can be realised in the line just after the usage of the docstring over where the code for declaration and definition of the Instance Variables is mentioned and in the succeeding line the __init__(self) is defined .

* In order to access the instance variables one needs Instance methods with 'self' as the first parameter which can be observed from the above code as well

* It is possible that the instance methods may have other parameters in addition to the 'self' parameter .

* In order to access the instance variables , one can use a self.variable in a program .

* It is also possible to access the instance variables from outside the class as : instancename.variable e.g. s1.x .

* Unlike instance variables , class variables are the variables whose single copy is available to all the instances of the class .

Wednesday, March 3, 2021

Cardinality Ratio concept in "Database and Management Systems" with explanatory figurative example

Cardinality Ratio concept in DBMS

Cardinality Ratio

It is the number of relationship instance that an entity can participate in . If one tries to understand the relationship between Student and Guide , then the relationship between the Student entity and the guide entity can be described in the following entity - relationship diagram .

Considering the given case , one can observe and try to understand the given relationship with the help of a Entity Relation Diagram with the help of a Set Diagram .

Here , some scenarios emerge like the given case :

Case-1

One guide (G1) can guide two students (S1) and (S2) whereas (S1) can only be guided by guide (G1) .

Case-2

Third Student (S3) can be guided by guide (G2) .

This is a kind of restriction set by the relationship where both the entity sets are mutually associated to each other by the relation between them . So here , the number of instance objects relation among each other is restricted and it can be also observed how the cardinality relationship is mapping one particular instance to another particular instance object through the given relationship diagram which depicts the way the entities can participate in .

From the above one can get a pretty good understanding of what a Cardinality Ratio is : -

Definition of Cardinality Ratio

The number of relationship instances that an entity can participate in is called as the Cardinality Ratio .

From the above diagram it can be noted that , one department would have only one HOD . So , in this case , the relationship would be only 1 is to 1 .

From the diagram we deduce that :

Relationship R1 exists from Department D1 to HOD H1 .

Relationship R2 exists from Department D2 to HOD H2 .

Relationship R3 exists from the Department D3 to HOD H3 .

A scenario over where such a type of relationship exists where there is only one relationship mapping from one Set's instance object to another Set's instance object is called a One to One relationship .

One can get a better understanding of this through the help of a E-R diagram shown at the bottom of the above figure .

===================================================================

The second type of relationship that exists is called One to Many Relationship

In the given figure , one can notice that there is a relationship existing between many departments and one student . Each department will have multiple students and thus one can notice from the given relation diagram that multiple relations exist from one department to student of another set but the student would be associated with only one department .

This is an instance of Many to One Relationship which is depicted by the Ratio form of representation (1:M) .which is another form of cardinality ratio expressed in the form of Many to One relation .

This means that Many Instances of any particular Entity Type will be associated or will be participating in the "Has" relationship .

Many to Many Relationship

In the given example , if one can see then one would be able to determine that there is Many to Many Relationship between the students set and Subjects set .

This can be rightfully depicted in the form , Relation between a Student on the left hand side and Subject on the right hand side of the Relationship Diagram . One can notice that multiple instance objects belonging to the set "Student" bear a many to many relationship between students of the other entity set which is the "Subject" set .

The depiction of the relation has been done through the help of an E-R diagram .

===================================================================

Now , we can get to understand the behaviour of these relationships that depict the manner in which relationships exist between instance objects of one set with another or multiple other sets in the article's showcased manner

Tuesday, March 2, 2021

Basic Domain Types Supported by SQL

SQL - Structured Query Language ( a basic course overview & revision on the paradigms and features of SQL as a Relational Database Management Language )

SQL - Structured Query Language

( a basic overview )

· The backbone of any Query Based Language is Relational Algebra which provides a concise , formal notation for representation of queries . However , commercial database systems required a query language which would be much user friendly and for this purpose SQL short for Structured Query Language was created which was the most influential commercially marketed query language .

· SQL uses a combination of relational algebra and relational calculus constructs for its construction and implementation

· Although we refer to SQL language as a "query language" , SQL can do much more than just query a database . SQL can define the structure of the data , modify the data in the database and it can also specify security constraints . This means that SQL can present to its user a brief highlight of the size of the database that the user is working upon . SQL can also be used to define any database and also the tables that a user wants to create and maintain over a database residing in the system as well as server memory . Also , on top of that SQL can be used to modify the structure of the database over which other data reside . Also , SQL can be used to specify the various security constraints that need to be defined over a database which would make the data specific to the users it is meant for by provisioning the access specifiers for access limitation as well as defining the needed integrity constraints which would ensure the ACID properties of a suitable relational database management system .

· In this article a complete User's Guide and How to Use / Implement a SQL installation won't be discussed nor is it intended for in this basic article but the aim of the article is to present SQL's fundamental constructs and concepts for brief usage by any User who is not only new to SQL but may be an intermediate or highly skilled versatile user of the language .

· Background of SQL - IBM developed the original version of SQL which was originally called as "Sequel" ( Any of the present day readers of the article can perhaps think of the word Sequel as some sort of movie sequel kind of the unlimited sequels of the Star Wars series .. seems a poor joke ) as part of the System R project of the early 1970's . The Sequel language has evolved since then , and its name has changed to SQL ( Structured Query Language ) . Many applications and products in the form of standalone and server based products now implement SQL as a part of their database engine and thus over the years SQL has become the supreme rdbms package which is not only widely used but also the most user friendly product among all the users of the current database users .

· In 1986 , the American National Standards Institute ( ANSI ) and the International Organisation of Standardisation ( ISO ) published an SQL standard which used to be called as SQL-86 at that point of time ( the postfix 86 succeeding SQL obviously comes from the manufacturing year of 1986 ) . Following this the standardisation organisation ANSI published an extended standard for SQL , SQL -89 in the year 1989 . The next version of the standard was then published in the year 1992 and thus the version was therefore called as SQL-92 standard , and then later versions followed like SQL-1999 etc . The most recent version that I have worked upon and also is installed over my system is the SQL-2003 version which serves most of the purpose . Apart from some server connection issues and importing data from other incompatible transactional data formats , I have not faced any major hurdles while using the present version of standalone SQL installation , However , if one wants to read a full scale bibliographic notes reference of these standards , one may read them in the documentation notes of the accompanying installation CD or executable software .

· SQL language has the several parts namely :

1) Data Definition Language (DDL)

The SQL DDL provides commands for defining the relation schemas , deletion of the relations and modification of the relation schemas .

2) Interactive Data Manipulation Language (DML )

The SQL DML includes a query language based on both the relational algebra and the tuple relational calculus . Which means that one can enter queries over a table and find out the necessary information from the database . The search query is mostly written in a language which implements the core

principles and working of relational algebra and relational calculus as their underlying coding platforms

3) Integrity

The SQL DDL includes commands for specifying integrity constraints that the data stored in the database must satisfy. All the necessary updates that violate the integrity constraints are not allowed over the query engine of SQL.

4) View Definition

The SQL DDL includes commands for defining the views upon a database. Views are nothing but a sub-part representation of a main table present over a database and it consists of only the structure of the database with filtered or unfiltered data from the main table .However any form of DML statements cannot be made to work upon a view.

5) Transaction Control

SQL includes commands for specifying the beginning and ending of the transactions . The broader meaning of this statement says about the way in which stored procedures , cursors and triggers could be used for the sake of making and breaking any transaction at the desired point of time

6) Embedded SQL and Dynamic SQL

Embedded and Dynamic SQL defines the manner in which SQL statements can be embedded within general purpose programming languages such as C , C++ , Java , PL/I , Cobol , Pascal and Fortran .

7) Authorization

The SQL DDL includes commands for specifying access rights to relations and views . This means that SQL provides its users with the potential to access the tables present over a given / created / hosted database specific rights for access and views to its data within the database . Many database systems support most of the SQL-92 standards and some of the new constructs which are present in SQL : 1999 and SQL : 2003 although no databases provide non-standard features which defer from the detailed and provided features of SQL as provided in the standard specification manual of the different release versions of SQL .

Friday, February 26, 2021

Keys of a Relational Database System ( detailed description with example on Primary Key Foreign Key , Candidate Key , Super Key )

· One must have a way to specify how the tuples within a given relational table are distinguished from one another which is usually done with the help of attributes of the relation which means that the attribute values of a tuple must be as such so that they can uniquely identify any given tuple from a given database

· In other words , no two tuples within a relation are allowed to have exactly the same value for all the attributes within a particular given relational table

· So for the easier and effective identification of any unique tuple or row from a database the concept of recognition using "Superkey" was coined .

· A Superkey is a set of one or more attributes that when taken collectively allows the identifier to uniquely identify a tuple within the relation .

· Example :

"customer_id" attribute of the relation Customer is sufficient to distinguish one customer tuple from another tuple . Therefore , "customer_id" is a superkey in the relation .Similarly , the combination of the following attributes "customer_name" and "customer_id" is a superkey for the relation "customer" . The "customer_name" attribute of "customer" is not a superkey because several people might have the same name .

· The concept of superkey is not sufficient since a superkey may also contain extraneous attributes within it .

· If one is often interested in superkeys over a tuple for which no proper subset is a superkey then such minimal superkeys are called as "candidate keys"

· In such a scenario , several distinct sets of attributes serve as a candidate key for a relation .Suppose a combination of "customer_name" and "customer_street" is sufficient to distinguish among members of the "customer" relation then both "customer_id" and {"customer_name" , " customer_street"} are called as "candidate keys"

· Although the attributes "customer_id" and "customer_name" are together used to distinguish a "customer" tuple .. the combination does not form a candidate key since the attribute "customer_id" alone is a candidate key .

· One can use the term "Primary Key" to denote a candidate key which is chosen by a database designer as the principal means of identifying the tuples within a Relation

· A key ( whether primary , candidate or super) is a property of the entire relation rather than the individual tuples of the relation . Any two individual tuples in the relation are prohibited from having the same value upon the KEY attribute at any point of time . The designation of a Key represents a constraint in the real world enterprise being modelled .

· Candidate Keys must be chosen with utmost care . As noted , the name of person for being selected as a form of candidate Key is not completely sufficient since a situation may arise from the given scenario where multiple people with the same initials might happen and in such a case , all the data might be fetched with the same initials at any given point of time within the database . And as such duplicacy of value for such a candidate key is not entertained and as such any two tuples within the relation are prohibited from having the same value on the key attribute at the same point of time . The designation of a Key represents a constraint in the real world enterprise being modelled .

· The Primary Key should be chosen in such a manner that its attribute values are never or rarely changed . For example , the address field of a person should not be part of the primary key , since the value is likely to change with the shifting of base / home from time to time . In the similar order , Social Security numbers of the dweller of any place can never change and remains the same from the time of birth to the time of death of the citizen . In the similar manner , Unique Identifiers generated by enterprises against any transaction are not likely to change and remain constant throughout and the given field could be considered as a primary key for a transaction relation .

· Therefore , formally reiterating once again , if R could be considered as a Relation Schema and one would say that a subset K of R is a superkey for the table then the framer of the relational table restricts consideration to relations r(R) in which no two distinct tuples have the same values on all the attributes in K . This means that if tuple t1 and tuple t2 are in relation r and t1 != t2 then t1[K] != t2[K] .

· A relation schema , say r1 may include among its attributes the primary key of another relation schema say r2 where this attribute is called as a foreign Key from relation r1 referencing the relation r2 . Here , the relation r2 is also called as the "referencing relation" of the the Foreign Key dependency and "r2" is called as the referenced relation of the foreign Key .

· For example , the attribute branch_name in Account schema or relation is a foreign key . For example , the attribute "branch_name" in Account schema is a foreign key from Account schema referencing Branch schema since branch_name is the primary key of Branch Schema . In any database instance , given any tuple say "tn" from the "Account" relation there must be some tuple say "tn" from the "Account" relation there must be some tuple say "t2" in the branch relation such that the value of the branch_name attribute of tn is the same as the value of the primary key that is "branch_name" of tb

· Therefore , it is customary to list the primary key attributes of a relation schema before other attributes for example , the attribute branch_name of Branch Schema is listed first since it is the primary key

· A database schema along with primary key and foreign key dependencies can be depicted pictorially as per the below given schema diagram . In the figure , a schema diagram for the banking enterprise has been depicted . Here , each relation appears as a Box with its attributes listed inside them and the name of the relation written above them . Hence , if there are primary key attributes , then a horizontal line crosses the box with the primary key attributes listed above the line in grey . Foreign key dependencies appear in the form of arrows from the foreign key attributes of the referencing relation to the primary key of the referenced relation .

Many database systems provide design tools with a graphical user interface for the

creation of schema diagrams .

Data Science and AI Quest

Tuesday, March 30, 2021

Working with Data Objects in R

Set-Difference Operations in RDBMS - an explanatory blog

Wednesday, March 17, 2021

Graphic Representation of all the involved Processes in Data Processing in Data Analytics and Data Science

Monday, March 15, 2021

Types of Variables in Python ( related to Classes and Objects in OOPS )

==========================================
Python Program to understand Instance Variables
==========================================

Wednesday, March 3, 2021

Cardinality Ratio concept in "Database and Management Systems" with explanatory figurative example

Tuesday, March 2, 2021

Basic Domain Types Supported by SQL

SQL - Structured Query Language ( a basic course overview & revision on the paradigms and features of SQL as a Relational Database Management Language )

Friday, February 26, 2021

Keys of a Relational Database System ( detailed description with example on Primary Key Foreign Key , Candidate Key , Super Key )

One Hot Encoding and Dummy Variables Generation upon a dataframe | Scenario - Perform One-Hot Encoding upon Un-Ordered Data in a sample dataframe and generate One-hot encoded feature variables | Conceptual Infographic Note

Tuesday, March 30, 2021

Wednesday, March 17, 2021

Monday, March 15, 2021

==========================================Python Program to understand Instance Variables==========================================

Wednesday, March 3, 2021

Tuesday, March 2, 2021

Friday, February 26, 2021

==========================================
Python Program to understand Instance Variables
==========================================