Wednesday, March 31, 2021

Rotating Data Tables / Matrices in R

 


        Rotating Data Tables / Matrices in R

 

* One can do a rotation operation over a T table / dataframe / matrix object so that the rows become the columns and the columns become the rows .

 

* For doing the rotation operation over any data table object / data frame object or a matrix object in R , one can use the t() command .

 

* One can think of such a program as short for transpose operation

 

* The given example begins with a dataframe that contains two columns of numeric data with its rows also named .

  



* The final object is transposed ( means - the rows are changed to columns and the columns are changed to rows / reversal of attributes )

 

* Also , the new object is in fact a matrix rather than a dataframe .

  

* One can see this much more clearly if one tries the same t() command over a simple vector .




* Vectors are treated like columns , but when they are displayed they look like rows


* In any event , when you apply a t() command one would get a matrix object as a result .


* One can easily convert the objects to a dataframe using the "as.dataframe()" command .



 

Selecting and Displaying Parts of a Vector in R Language

 


   Selecting and Displaying Parts of a Vector

 

·     *   Being able to select and display the parts of a vector is one of the most important reasons of selection of a Vector .

 

·     * If one has a large sample of data , then in case if one wants to obtain a large sample of data , then one may want to see which of the items are larger than which of the values which would require the user of the data to select those data that are larger ones among the dataset

 

·    *   In an alternative scenario , one may want to extract a series of values as a subsample from an analysis .

 

·    *    Being able to select / extract required parts of a vector is one of the most important aspects of performing many more complicated operations in R tool .

 

* The various examples or processes that one may come across while doing any type of selection of a chunk of data from a vector are in form of given scenarios :

 

·        extraction of the first item / single item from within a vector ;
·        selection of the third item ( nth item) from within a vector ;
·        selection of the first to the third items from a vector ;
·        selection and extraction of all items from a vector ;
·        selection of items from the combination vector ;
·        selection of all items which are greater than the value 3 (that means selection and extraction of given items with a value for the number greater than or lesser than some particular number) ;
·        Showing items which are either greater than or lesser than some set of numbers








  The other useful commands over the objects which can be used to extract the various parts of data are : length() command which can be used to find the length of a given vector .

·        The length command can be also put to use to obtain / extract segments of data from square brackets :

 

data[ length(data) - 5 : length(data)]

 

In the above given scenario example the last five elements of the vector are found out from the above used code :

·        

·        max() command can be used to get the largest value in the vector

 

==========================================================

> data1

[1] 4 6 8 6 4 3 7 9 6 7 10

 

> max( data1 )

[1] 10

 

> which( data1 == max(data1))

[1] 11

==========================================================

 

# The upper command -- "which" is showing the index number or position of the largest data from within the data vector . The maximum value of all the data elements present within the "data1" vector is 10 . The positional index value of the data from the vector is 11 .

 

·        The first command .. max() provides the actual value which is the largest value within the vector and the second command asks which of the elements is the largest .

 

·        Another useful command is one that generates sequences from a vector which can be expressed in the form .. seq()

·        While using the "sequence" vector , one may need to pick out the beginning to ending of the interval vectors . In given words , one may select the first , third , fifth and so on vectors using the given in sequence parameters like the start , end and the interval values .

 

·        Therefore putting the full scale general form of the "sequence" command can be writen in the given form :

 

seq(start ,end ,interval)

 

 

·        The above command will work on character vectors as well as numeric vectors in the given manner :

 ==========================================================

> data5

   "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct"     "Nov" "Dec"

 

> data5[-1:-6]

   "Jul" "Aug" "Sep" "Oct" "Nov" "Dec" 

=================================================

* In the above code , the last 6 strings which are actually the three letter initials of each of the months of a calendar year are found out as result .

 

Tuesday, March 30, 2021

Working with Data Objects in R

   

       

               Working with Data Objects in R


* Data Objects are the fundamental items that one can work with in R language

* Carrying out analysis on one's given data and making sense of the results are the most important reasons of using R . In this article methods of working over a given data set would be provided and along with that understanding the results associated with the data objects of R is the central idea for working over R

* One can learn to use the different forms of data that are associated with R and how to convert the data from one form to another form

* Over R , one would also learn the techniques of sorting and rearranging the data 

* Some of the processes involved with Data Objects manipulation in R are in following manner :

 

==========================

        Manipulating Data Objects in R

==========================


- When you collect the data the very forst step is to get the data over to the R console

 - After the data is imported over to the R console the very first step to perform over the data is to do summary statistics steps for finding out the requisite results from the data

 - One may try to find more of some analytical results from the data from the given summary statistics of R as one may want to manipulate the data that is present over the R software .

 - One may want to reorder the data into newer and more informative manner from the first version of the imported data otherwise one may also try to do a more informative manner of data reordering .

 - Data Manipulation exercises of the given manner could also be performed over the data like extraction of certain parts of complex form of data etc could also be done from the data .

 - As given there are multiple ways of manipulation of the data and thus understanding how to do this is the most important aspect of learning about R because the more one knows about the working of R language and the way R handles the objects , it is better to make use of R as an analytical tool

Set-Difference Operations in RDBMS - an explanatory blog

   


            Set-Difference Operations in RDBMS

 

* The Set-Difference operation , denoted by "-" allows us to find the tuples that are in one relation but not in another .

 

* The expression , ( r - s ) produces a relation containing those tuples in r but not in s .

 

* One can find all the customers of the bank who have an account but not a loan by writing in the below manner :

 



* In the case of the Union Operation , one must ensure that the set differences are taken between compatible relations .

 

* For a set-difference operation where "r-s" to be valid , one requires that the relations r and s be of the same arity , and that the domains of the ith attribute of r and the ith attribute of s should be the same .

Wednesday, March 17, 2021

Cartesian Product Operation in SQL and RDBMS - concept discussion with example



  Cartesian Product Operation

 

* The Cartesian product operation is denoted by a cross ( X ) which allows us to combine information from any two relations .

 

* One can write the Cartesian Product of Relations R1 and R2 as R1 x R2

 

* A relation is by definition a subset of a Cartesian product of a set of domains .

 

* From the definition , one can have an intuition about the definition of the Cartesian product operation .

 

* Since the same attribute name may appear in both R1 and R2 , one may need to devise a naming schema to distinguish between the attributes .

 

* One can do the attachment to an attribute , the name of the relation from which the attribute originally came from .

 

* For example , the relation schema for

r = borrower * loan is :

 

(

borrower.customer_name ,

borrower.loan_number,

loan.loan_number ,

loan.branch_name ,

loan.branch_name ,

loan.amount

)

 

* With the schema , one can distinguish between borrower.loan_number from loan.loan_number as both the attributes do have the same name but the main relational table for the table are different which are loan and borrower

 

* The naming convention for any of the relations or schemas requires that the relation should have distinct names

 

* In general , if we have two relations r1(R1) and r2(R2) , then r1 x r2 is a relation whose schema is the concatenation of relations R1 and R2 .

 

Relation R contains all the tuples t for which there is a tuple t1 in relation r1 ; and a tuple t2 in r2 for which t|R1| = t|R1| and t|R2| = t2|R2|

 

* Suppose we want to find the names of all the customers who have a loan at the "PerryRidge" branch , then one may need the information in both the loan relation and the borrower relation through the given selection statement .

 

<< FIG - 01 >>

 



 

 From the given statement , one can find that a selection of attribute "branch-name" can be done with a relation existing from the relational tables for borrower and the loan relational table but given that the branch_name is "PerryRidge" over here .

 

* From the above relation , if one wants to find if there is a cartesian product operation that associates every tuple of loan with every tuple of borrower , with the customer having a loan in the “PerryRidge” branch , then there is some tuple in borrower x loan that contains the name of the customers which can be obtained by

criterion / condition from one table borrower to table loan where borrower.loan_number = loan.loan_number

 

Graphic Representation of all the involved Processes in Data Processing in Data Analytics and Data Science

 


Union Operation in RDBMS - Fundamental Relational Algebra Concept

 


            Union Operation - RDBMS

        Fundamental Relational Algebra Concept

=======================================

* Scenario where a Union Operation over a rdbms table could be used : -

Consider a query to find the names of all the bank customers who have either an account or a loan or both in a bank .

 

* To find the names of the customer who have an account and also a deposit account in the bank , the search query would consider the search to take over "depositor" relation table and the "borrower" relation table .

 

* In order to find out the names of all the customers with a loan in the bank , the query that could be used for the operation would be :

 

{figure-01}

The upper relational equation is a form of projection which allows us to produce the relevant relation that returns the argument relation specified within the parenthesis . So as we are interested in finding out the names of the customers who have a loan account , the

relevant result would be fetched from the projection relation as mentioned in the above statement .

 

* Similarly , if we want to know the names of all the customers with an account in the bank , then it can be expressed in the following projection equation :

 




{figure-02}

 

 

* Therefore , in order to find the answer to the raised question in our scenario that is discussed in the opening statement , one needs to do a "Union" operation over the two sets which is : we need all the customer names that appear in either both or two of the relations which can be found out by using the binary operation Union upon both the queries . The relevant relational equation can be represented in the given manner :


 
    {figure-03}

 

* The resulting relation for the query would result in a relation with the relevant tuples from both the tables

 

* The resulting relations are sets from which duplicate values are eliminated .

 

In Hindsight :

In our example , we took the union of two sets , both of which consisted of the attribute "customer_names" values . In general , one must ensure that unions are taken between compatible relations which would generate the appropriate results without duplicates.

 

* Points to note when using a Union Operation over relational sets :

 

If r is a set and s is set , and one needs to find { r U s } , If r is a set and s is set , and one needs to find { r U s } , then the conditions that should satisfy and hold for both relations

1)        The relations r and s must be of the same type and they must have the same number of attributes

 

2)   The domains of the ith attribute of r and ith attribute of s must be the same for all values of i . One may note that , both r and s are either database relations or temporary relations that are the result of relational algebra expressions as given in figure 1 and figure 2 of the article .