April 28, 2024

R Programming Full Course for 2023 | R Programming For Beginners | R Tutorial | Simplilearn



Published May 17, 2023, 1:20 a.m. by Naomi Charles


Are you looking for a comprehensive R programming course? Then you've come to the right place! simplilearn's R programming tutorial will help you learn the basics of programming in R and start using it for statistical analysis, data visualization, and predictive modeling.

R is a powerful programming language that is widely used in statistical analysis, data visualization, and predictive modeling. It is a popular language among data scientists and statisticians.

If you're just getting started with R, our R programming tutorial for beginners is the perfect place to start. This course will teach you the basics of programming in R, including how to install R and RStudio, import data, perform basic statistical analysis, and create data visualizations.

Once you've completed the R programming tutorial for beginners, you can move on to our other R courses to learn more advanced topics. We offer a wide range of R courses, from beginner to expert level.

No matter what your level of experience is, we have an R course that's right for you. So what are you waiting for? Start learning R today!

You may also like to read about:



r has become the language for

statistical computing and graphics it is

one of the most popular analytic tools

our programming was written by robert

gentleman and ross e haka at the

auckland university new zealand

r is a free and open source software

that is commonly used to solve

statistics time series classification

clustering and other data science tasks

it is also widely preferred for data

visualization because it has a

collection of great packages the

availability of our packages makes it

stand differently from the other

programming languages

by learning r you can become a data

scientist statistician data analyst our

programmer or a business analyst with

sectors such as health care e-commerce

retail banking and finance

with more and more companies focusing on

generating insights from data a

significant growth has been noticed in

our programming over the years some of

the top companies using r include google

amazon twitter ibm oracle and firefox

so r is constantly evolving and keeping

itself ahead of the edge our vast

community ensures that r does not get

outdated or rolls cool as they keep

adding new functionalities and updates

with that let's have a look at the

agenda for our r programming course for

2022

first we will look into variables and

data types

then we will move on to logical

operators

following which we will look into vector

matrix list and data frames then we will

look into functions and flow control

statements

followed by dplyer and tidyr for data

manipulation next we will look into

ggplot library for data visualization

and finally have a look at time series

in r

so let's get started

let's see what is our programming

and how it helps

so r

is

well known as a language of data science

now if you really look at the ranking

from survey of data mining experts

based on the softwares they have often

used in their work

r is used more than python when it comes

to data science python is also used

however r is

predominantly more used for data science

kind of activities it's a open source

programming language used for

statistical computing it is one of the

most popular programming languages today

it was inspired by s plus

and it is similar to s programming

language so when it comes to data

science

what we can say is r is

a popularly used programming language

across the globe

it is free and open source as i

mentioned it is optimized for vector

operations which we will learn about

later

it has an amazing

community

has

in fact 9 000 plus

contributed or community packages

allowing us to do

almost anything or everything using r

now when we talk about features of r

as i said it's open source programming

language so you can install r for free

and you can straight away start working

you wouldn't have to really go for a

licensed version or pay for the software

non-coders can also understand and

perform programming in r as it is easy

to understand

and it has various data structures and

operators it can be integrated with

other programming languages like c c

plus plus java and python

it consists of various inbuilt packages

a lot of sample data sets which can be

used

and that makes

reporting the results of an analysis

easier by using r

now before we start learning about

variables loops how you work with r and

so on it would be good to know how you

can set up r and work on r so for that

what you can do is you can

just go to r minus project

dot org

and

once we get to the home page

of our project for statistical computing

using this link

we can click on download r here

now that brings you to a page to

download it now there are various links

here so it shows you the comprehensive r

archive network that is cran mirrors

and it is available at different urls

however i would choose the first one

which is zero cloud you can just click

on this one and then based on your

operating system whether you are working

on a linux machine on a macbook or

windows you can install it so you can

just click on this one as of now i'm

using a windows machine so i can click

on download r for windows and that takes

me to this link which says binaries for

base distribution now this is what we

can use to work with r straight away

however there is one more package that

is rstudio we will see how we can set up

that now this one takes us to the best

mirror possible

for our location from where we can

download r

so you can click on this base and then

you can download by clicking on this

link i have already downloaded this so

once you click on this one you can just

save it so i have it here already in my

downloads and that's more than enough

then you can just double click and you

can

go through the instructions

to set up r that would also allow you to

basically

set up a desktop shortcut which i have

already done here on my machine and if i

go in here i see our base

you can click on this one and that

brings you to the page which you can use

to straight away start working with r

now

yes there is uh one more package called

rstudio

which is set up on top of base r which

makes working with r easier now here

also you can start working so it shows

you our console and you can click on

file and if you have some scripts or

files already written in the format of r

you can use those so i can click on open

script and that takes me to a page where

i have some files which are already

existing i can just select this one and

click on open

and that shows me some options here so i

have an editor which shows me say if i

want to get a library to use built-in

data sets i could summarize the data i

could do a clean up and we'll see all of

this but i would suggest using rstudio

rather than just using base r however

installing base r would be required and

depending on your machine configuration

like mine is a 64 bit i have chosen 64

bit while i was setting up base r

now when it comes to r studio

it is basically a

package which makes working with our

easier

so to install our studio what you can do

is you can go to the r studio home page

or you can just go to google and say

type r studio download

and then it takes you to this page you

can click on this which says download

rstudio you can choose your version you

can go for the free version that is r

studio desktop and you can click on this

download and then you can download

rstudio for windows which i have already

done and then you have to run through

the steps so just click on this one and

i already have r studio here

right now i can just basically use that

so for example if i go to downloads and

if i look for r studio if i do a double

click i can say yes

and then it takes me to the r studio

setup just click on next

and here

you can choose the location if you would

want to place it in a specific location

click on next and then it says select

the start menu folder so let our studio

be chosen here click on install

and then it will basically start

installing this in a particular location

now in my case it is already existing

right so

we can even click on show details and

see what it is doing what packages or

what executables it is extracting now

once this is done then you will be able

to use our studio you can also add a

shortcut to your taskbar

and you can continue using it so i've

already done this this might take couple

of seconds just wait for this to

complete

and you would have r studio

which is an easier way of working with r

so a lot of developers across the globe

would be using rstudio when they are

working with r to work on their data

science or programming requirements

now let's just wait it is almost done

and now i can click on finish

so so that part is done you can add it

as a shortcut so rstudio has consistent

commands

it has unified interface it makes easy

to navigate and manage through r

and it is set up on top of your

r base now if i click and open on this

so

that's my r studio which is coming up

now

here you see console which

will show you the result where you can

give your commands

so where we can get text output now

again i can choose a file so i can just

say open file and then i can go into a

particular location where i have

downloaded some data

and then basically i can choose say for

example rstudio

and that brings me here so now you have

your script which has some commands

right on the left

bottom you have console where you can

see the output

on the right side you also have

environment

now that is to use or provide variables

and then we can also have plots which

we can see here now we can look at this

as an example so here i am

loading the built-in data sets so what i

can just do is i can place my cursor

here and i can just do a control enter

and that basically loads the built-in

data sets which we can see here that has

been done now there is an inbuilt iris

data set

and we can just use head option to look

at the first six lines of iris data set

so

just place your cursor and do a control

enter and that shows you a summary

basically the first six lines of this

data set what it contains we will look

into this data set later this is a

default data set

which you can easily find when you are

working with r you can also have your

cursor place on summary and then just do

a control enter so that basically shows

you summary statistics for iris data

you can do a plot

and that basically shows you the plot

which you can also maximize and look at

it in full screen you can just do a zoom

if you are interested in looking into

this and we will discuss how

or what kind of information we can infer

from the plots now when it comes to

cleaning up you can just do detach and

then we can say package data sets

and here we had loaded those data sets

so we are just doing a detach and we can

say unload equals true so i'll just do a

ctrl enter i can also clear off the

plots by doing this

for whatever plots we had and we can

either do a edit and then we can do a

clear console from here or the shortcut

is ctrl r and you can clear of the

console

so that's a simple way of starting

your working with r by installing r

studio

so let's continue learning

about working with r

and basically the first thing which we

should learn here is about variables in

r

so variables

as in any programming language is a way

to store

your data value

factor of list values or a data set or

object in r

it allows us to conveniently reference

the variable name

basically saving us from rewriting

the data value or object many times in

our program so when we talk about

variables in r

they are mainly used to store

data with named locations that your

programs can manipulate

a variable can be a combination of

letters digits period and underscore

so you can have some valid variables as

total sum

you can also have dot notation so there

are different naming or style

conventions in r and we can use dot to

separate names in description of a

variable we can also start a variable

with dot

we can include numbers in a variable and

remember r is case sensitive so we have

to whenever we declare a variable we

need to remember

what case was used

as in

in the name of the variable and there

can be other conventions also such as

using an underscore or even using a case

in between the variables so variables

can only consist of letters numbers

periods

underscores

your dot followed by a letter not a

number

and we can declare our variables we can

also look at the type of the variables

and the class to which it belongs so

there are some invalid variables which

we are seeing here so that also needs to

be remembered so this is an example

where you can use an assignment operator

which you see here between x and 10 to

assign a value to a variable you could

also do that by doing a dot y

and then assign a value you could be

doing that by using a z

and then having a

computation done between x and y and

finally you could do a print so let's

see some example here before we move

further and for that i can bring up my r

studio here so as i said we can

basically have different

kind of variables or naming conventions

for example i could do something like

model one

and then i can basically assign this so

this is just

a

variable and i could be assigning

anything to it i could be assigning

different data types which are available

here for example i could do something

like this and i could do a control enter

so

that's my variable i can always do a

type

off

and then basically

i can check what's the type of my

variable so it tells it's a character

i can also

do a class

and then i can basically say show me the

class and that shows me it belongs to

the character class we'll learn about

data types later but we are using

assignment operator now if i say what is

model 1 it shows me the value but if i

would do something like this

then it says object model not found and

why because it is case sensitive the

variable which we had created was all in

lower case

and the one which we tried to call was

starting with an upper case

so you could have variables created in

such way i could also do something like

hello

underscore string

and this could be my variable where we

are using an underscore

and then we can just given something

here

and that becomes my variable which you

can always call

and check what is the value of that you

could also be doing something like this

so you could

be using

different cases and then i could say

something like this and that's also my

variable

and then i can basically look at the

value of this variable now

if we try to create a variable where we

start the variable name with the number

what would happen so if i say something

like this

and then if i try to assign a value to

it

for example let's say 100 now this one

will throw an error message because you

cannot have

your variable starting with a number but

if i used period and then basically

give

something like this

and let's try doing this by giving it a

number

so

if you see here

since we gave a period

the rule is that it should be followed

always by

a

letter and not a number so i could just

remove this and that works perfectly

fine

so these are some naming conventions

which when you practice you will learn

about so now i can assign a variable by

just doing a dot pairs and then assign

any value to it but always remember if

you are using a period if you are using

a notation then in that case that should

always be followed by a letter one more

thing which is always practiced in a

real time environment is that

we cannot have spaces

when we are creating variables so for

example if i say first

num and then i try to assign this a

value

it basically fails but obviously i could

have done this by doing it underscore

and that perfectly works fine and you

can basically then call the value for

this one always remember one more

standard practice which is followed in

real time environment

is you will try to have variable names

with

a little meaning to them so for example

if i would create a variable

and i would say for example

let's say bird

that's my variable name

and then if i assign this a value tiger

it works fine but then it really does

not make sense

and that would basically create a lot of

ambiguity in our coding so it is always

good to say for example animal and then

i would say okay so tiger is an animal

and that basically not only

allows me to assign a value to the

variable but it is also a little bit

more

meaningful now when we talk about

variables it is also good to know the

different data types which are available

in r now like any other programming

language

r also supports different data types

so you have your logical data type such

as true and false you have numeric

values which is say these numbers you

could also be creating an integer

which is 3l and 40 l for l and so on you

can have a complex number you can have

characters which can be just letters or

a set of letters or anything which is

within the quotes or you can even have

raw data so these are different data

types we can again see quick examples

here on data types let me come out of

this one and as we saw already when we

created model 1 this was character now i

can just say x and let's say

100 and obviously this is going to be

not my integer

okay so let's see this what is this one

this one by default is double

it is by default double so if i would

want an integer then i would say for

example something like

like this

and this one you can check

by using type off and you can see the

value for this one so this is an integer

so similarly you can have character you

can have

complex you can have raw data you can

have numeric values so all these are

different data types you could also be

saying for example i would want to check

the boolean so i could check this

and select this one

and now when i check the value for a

it is true

and we will learn about logical

operators where we can basically be

using these values assigned to the

variables to compare to compute between

different variables so this is a simple

small example of using variables

so we have seen here using variables and

also

using the assignment operator

and then assigning values to the

variables and different naming

conventions we can also be

using different data types which are

supports

and work with the variables

now once we have learnt about variables

or data types let's also just

first learn about your operators

and how they can be used in your r

programming language

now

we might be

intending to do some calculations on

numeric values

find out differences between values

or say for example compare values so in

that we can be using different kind of

operators so we have

various operators we have arithmetic

operators we have rational

operators we also have logical operators

so before we straightaway look into

logical operators let's also understand

about the basics such as your arithmetic

operators which supports for example let

me pull up a notepad file here

and when we talk about arithmetic

operators

here we are talking about

your

addition

[Music]

you have subtraction

you have multiplication

you have division

and you have remainder or modulus

and you have exponent

and what makes it also important is that

when you're using arithmetic operators

you also need to know about the order of

operations

so when you say order of operations

always the priority is to parenthesis

so that takes the priority you have then

exponent

or your computation if that would

involve

exponent

so let's say

exponent here

which is then followed by your

multiplication

and division

and that one also follows an order of

left to right

whichever comes first when we talk about

multiplication and division and

similarly when we talk about addition

and subtraction

it is left to right

whichever comes first so these are some

of the arithmetic operators now we can

see some examples here quickly

although these are some simple examples

so for example i can say 100 plus

100 and that gives me the value right

you can always do a 100 minus

fifty

you can do a hundred multiplication

you could do a hundred division two

or you could also use modulus

to

which basically gives you an error here

so i will

oh

just give me a minute

so let's give here one more percentage

sign

and that basically says what would be

the remainder

so if we would want to look at the

ordering when we are using this

arithmetic operators

we can see an example so for example if

i say 34 plus 46

divided by 2 gives me

57 however if i use 34 plus 46 in

paranthesis which gets the priority and

then i divide my result is different so

understanding what arithmetic operators

you can use and also the ordering in

which

that leads to the computation is very

important

so we can use all of these arithmetic

operators and to control the ordering we

can be using paranthesis

or we can have our computations ordered

with what kind of operation we would

want whether that would be

multiplication or division addition or

subtraction now at any point of time i

can always do a control l

and that allows me to clear my console

let's continue our learning and let's

learn about operators

so when we speak about arithmetic

operators we see that allows us to do

computations but we have also rational

and logical operators which help us in

doing our computations or comparing

values or sometimes finding

difference between different values

whether those are group of values or

whether those are individual values so

with your rational and logical operators

you can compare data values

so

if we would want to see if the values

match or not match or if the values are

above or below equal to something and so

on

so when we talk about your rational

operators we basically have

in case of rational or

logical operators

rational or logical operators

so we obviously have greater than

you have

less than

you have greater than or

equal

you have less than

or equal

you have equal to

and you have not equal

these are some of your

rational operators we can say

and when you talk about your logical

operators then you have and you have or

and you have not

so and

is

when it compares two values so it

returns true if both the conditions are

true else it will return a false

so for example if i have 10 greater than

20

and 10 is less than 20. now that's not

possible and we are comparing

the result of both of these so we are

checking

if both the conditions are true and

that's not

really true here so we see the value as

false now if i would have replaced this

one this and with or

it would check

even if one of the conditions is true it

would basically show me a result as true

you can also use a not operator which

takes each element of the vector

and gives the opposite value

so we can be using any one of these

operators

and then basically do our computations

so let's see some examples about these

logical operators now either you could

just be assigning values to your

variables and check or you could also be

picking up a data set

from your machine and then try to use

these logical operators so for example

if i say x

has been assigned 100

y

has been assigned 200

and if i try to say x

equals

y

so that already

checks the value and compares and tells

me that's not true it is false and if i

would have used a not operator

for example if i would have said

something like

this one

so it tells me true so i can just check

simple conditions like this

i can say

is my y greater than x

and that tells me yes it is true

if i say y is greater than or equal to x

well

it would still say true

because when you are saying greater than

or equal to x so when you're saying this

one it works fine right now we can also

be picking up some data set and for that

what i can do is i can pick up one of

the data set from my machine so i can go

in here and i have some data sets let's

look into that and i would be interested

in taking this auction data set and

loading the values here so i'll get this

path

and i will come here i can use auction

as my variable name you could have given

a dot separated name for example i could

have said auction dot data if this is

what you want to do

and then you can assign variable

a value so here i'll say read.csv

and i intend to pick up a file so i give

this path

and when we are working on windows

machine we need to give a double slash

so i'll say auction.csv now i could give

other things like header being true

what is the separator

if you would want to fill values to take

care of missing values we can look at

all of those so here i'll just add a

backslash

i will add a backslash

and i will basically just do a control

enter now i can look at the values of

this by just doing a auction.data

and i can see what values it has so it

has a lot of data here

it has a lot of your data here you could

have used some other functions which we

can see later

where

i can choose

head

and i can see the first top five values

so we can basically

assign

data to the variable and continue

working on this

now we can keep it simple so let me

repeat this step

and here i will say auction

as my variable name and i'll assign this

so i can basically do a also a view

on auction

so auction

and then basically that shows me a

tabular format of the data which allows

me to look into the data and basically

understand it and then i can

you know

use this to work on variables so what i

can do here is i can say x

and let's say

assign some value to this for which i

would want to work on my data set which

is auction

now what do i want to do here so let's

use auction

and then i can use a dollar symbol and i

can choose which column i'm interested

in so for example let's choose bidder

and i can just give a value to this one

and let's pick up a name

so let's say tweet

and that's the name

and i can be assigning all the values to

this

or i could say i would want to use

another condition so i'll say auction

dollar

and then let's take this value of bid

and let's say it is equals to

100

and then i ended up with comma and i can

try doing this now here it gives me a

problem because what we did was

we

did not use the right operator so we

will say for example and

so i will say

x

is being assigned the value of

auction bidder

being

tweak

and auction bid value being hundred

so now once we do this i can look at the

value of x and that shows me the value

so this is just a simple example of

using a logical operator now i could

have

just said

instead of and i could have used or

which is basically a pipe

which you have to use

and that gives you or condition and now

hit on enter and if i now look at the

values of x it will show me a lot of

values because we have given an r

condition which basically matches one of

the conditions so in this way we can use

logical operators and continue working

and

continue doing our computations

let's learn about print formatting and

how print can be used to

view your data

when you talk about r r uses print

function to display the variables

so for example if i have assigned number

10 to x

i can do a print x and that will show me

the value of

x

what we see here with 1 in square

brackets that also has a meaning which

basically means it is a vector

and we'll learn about vectors later so r

uses the paste and paste

0 functions to format strings and

variables together for printing in few

different ways for example if i would do

this which i say is print paste

and then

pass in

two strings here or two words here such

as hello and world

that would be

printed as follows now i could also do a

print paste

and then use a separator

so my print would look something like

this if i use paste 0 then that avoids

any space between these two words or for

example these three words

so let's see some basic examples here

when we talk about print

so for example if i bring up my r studio

here is an example

so x as we say now this is your

assignment operator which we already

discussed

now i can be assigning a value to this

so i can just place my cursor here

and i can just hit on control enter so

value has been assigned now let's look

at the value of x

now i could also be doing a print x

explicitly by

using print function for example if i do

similarly for message as hello

and then i can print the message

by using print

now if for example i do something like

this

this is not going to print anything

until i call the variable or i use a

print function so for example if i do a

y

pc auto printing

shows us the value or i could do

explicitly by using the print function

by explicit printing now whenever we

look at this number one as i mentioned

it means y is a vector and five

is its first element now you can also

use operator to create integer sequences

and we'll learn about sequences or list

later but this is just a simple example

so i am creating an integer sequence of

length 20. i can place my cursor here

which would start with 10 and end at 30

so let's look at this values

for our sequence

of integers

now at any point of time you can always

use a class

to look at

the

class

of say x

and that shows me the classes

of integers

now looking further when we talk about

different data types as we learned

few minutes before

so r has basically

five basic or atomic classes of objects

so you have character numeric values

that is real numbers you have integers

you have complex and you have logical

values

let's

spend some time in understanding some

basic arithmetic operations and how you

can do it using your r programming

language now here i have opened up

rstudio and these are some basic

examples such as performing arithmetic

operations

now for example we can add two numbers

and i can just place my cursor here and

please press ctrl enter

that shows me the addition i can do a

subtraction

i can do multiplication division

also going for exponential power

or use modulo which returns the

remainder

now

when we are performing operations what

we can also do is we can change the

order of operations

and in this case we are using

parentheses so i am putting in 500 into

2 in a paranthesis

plus

80 divided by 2 so first it operates

what is given in parenthesis

and that's why i get a result 1040

similarly i can change the order of

operations so here i can give 500 into

and then something in the parenthesis so

that gets operated first and hence you

get result of thousand five hundred now

we have already discussed about the

assignment operator and what we can do

here is we can assign variables

some value so for example i create a

variable called selling and then i would

assign it a value similarly for cost and

then we can do some calculation so we

can say profit

is selling minus cost

we can do that and here i can look at

the value of profit which shows me 250.

now let's also spend some time in

understanding data types in our so we

can have different types

of data so

this one shows me an example of

assigning a decimal value which is part

of a numeric class so i can do this

and then if i would be interested in

seeing the value of num so i can just

look at the value of num

if i would be interested in looking at

the type of num so i can do that here by

just typing in type off

and

then select this one and pass in your

num and it shows me the value is double

i can also look at what class it belongs

to and that shows me

it is numeric

so in this way you can not only assign

values to a variable but you can look at

the class and type of it now here we can

assign whole numbers which are also

known as integers now if i look at the

type of this it shows me double so if i

would want to explicitly assign an

integer i could have done for example i

let's say j

and i could have used the assignment

operator and i could have done this

and then if i look at the value of j it

shows me the value but what we would be

interested in looking at the class of j

so we can do this and it shows me it is

an integer so explicitly either i can

assign this by using a capital l

or i could use a function called

as dot integer so we'll see that later

now we can also assign boolean values or

basically your logicals so here we

assign true and then we do a false

and we can look at the type of t and

that tells me it is a logical class

now similarly you might be interested in

working on

text or string values and here we can do

this by saying

ch and then passing in a value look at

the class of this it tells me it is

the data type is character and if you

look at the type of it it says me

character

similarly r also supports complex data

types so we can do that too by just

doing this and look at the class of it

it tells me it is complex and you can

also pull out the length of this by

now here we are doing a length on

the character so let's look at this one

and it shows me what is the length of

this

now

one of the useful functions which we

usually use in r is print now i can

simply do a print hey and that prints

whatever values pass to print i can

assign a value to a variable and then

print it so that is also fine you could

have also without using function just

type y and that also shows the value

however sometimes using print as an

explicit function can be useful it makes

your code more readable now here we

would use an inbuilt data set that is

empty cars

and then if you would want to print the

data set that shows me the values which

shows me

the car models and different other

features such as mileage cylinder

horsepower and so on now one of the use

case of print

with a paste function can also be seen

here so i'm doing a print paste and that

basically prints whatever was passed in

a concatenated way i could also do a

print paste with a separator

if i would want to format my data in a

particular way so here i've used

separator as comma

there is one more function paste 0 which

can be used so i'm just doing here paste

0 and that tells me just concatenate

these values without any space so paste

0 shows no space between these two

elements which were passed now we can

explicitly do some printing and for that

i'm using a s print f

option

i am going to pass in percentage s which

is for string and percentage f for float

and we can print the values of this so

these are some basic operations or usage

of your functions to

basically do some computations or look

at your results

so when you talk about basic type of any

r object it is a vector

and when we talk about vectors empty

vectors can be created with vector

function

a vector can contain objects of same

type or a class now when we talk about

list

list is a vector which contains objects

of different classes

so these are some basic examples so

apart from your print formatting we can

be looking at what we call as our

objects such as vectors or lists and so

on so when we talk about vectors it is a

sequence of data elements of same basic

type

we use

the function to declare a vector so we

can always do a c function to declare a

vector

for example here we are creating a

variable v 1 and we are assigning it a

vector by using c and then giving some

basic type so numbers 1 to 5 or for

example words you can always do a print

or you can also use a class

to find out what is the class

of the elements or

the values which have been passed to the

particular object so we can look at some

examples like this for example

we can see here so list

is a vector which contains objects of

different classes

so you can have numeric objects so that

is your numbers such as 1 2 etc

which are your numeric values for

example here what we are doing is we are

assigning a value 1

to a

and that

can then be used i can either do a print

or i can just use auto printing i can

also do

here a value for a i

or i could be doing something like this

which shows me 0

which can be for missing value so if i

would want to

use auto printing i can just call a and

it shows me the value what has been

assigned to it you can always use a type

of

to look at the value of a which is

double by default and if i look at type

of a i

that is basically an integer because we

used l here

so in this way we can continue working

with say

our different classes

of objects so for example let's create a

vector here so i can say v1 and then

basically assign it by using a c

function

and then pass in the values to this one

and that basically

gives me a variable and you can look at

what are the values assigned to it now

if i look at the class of v1

that shows me it is numeric if you use

type off

and then you would want to see the

values of v1 that shows me the values

are double now as we were seeing here we

can be looking at the class so for

example if i create one more

variable and then assign values to it

using c

so

passing in some words here

for example let's go and say hello world

and then i can basically

do this and look at the values of this

one i could also explicitly print as we

discussed earlier

by doing a print v2 we could also be

having a paste function

if we would want to use that so for

example if i would do a paste function

i could be using

and this is missing a bracket so let's

complete this

and that shows me the value i could have

also used for example

paste 0 function

and that also works fine

so it depends on what we are looking at

here so if i look at class

of v1 which we had it is numeric and v2

is basically

having elements which are of the class

character

so this is just a simple example of

having

your print functions having vectors

created printing out the values of those

printing out class and type of these

to continue our learning on vectors

as i mentioned earlier we can use the c

function

which can be used to create vectors of

objects by concatenating things together

so for example if we look at this

one which says x and then i use c

function and i say

0.5 and 0.6 so we can have a vector of

numeric types

so let's

do this

and then we can look at the value of x

so it shows me my vector which has 0.5

and 0.06 i can also have

my vector of logical values

and now let's look at the value of x so

it has true and false

or we could have done it in this way

where we can then look at the values so

we can use the short form by using

capital t and f

i can create a vector

with character types and then look at

the values of those

i can also be

creating a sequence of integers as we

saw in previous example and then look at

the values which start at 9 and end at

29.

now you can also create with complex

types and look at the values so these

are some simple examples of creating

vectors now we can also use vector

function to initialize vectors

so for example if i would do this

where i am saying my vector will be of

type numeric length is 10 and then look

at the values so it just shows me

a vector which has

all zeros and the length is 10. now you

can create a vector of numbers

by doing this as we saw in previous

example and use explicit printing to

look at the values or might be letters

and then use a print statement to print

function to basically look at the values

of the vector now we can also try

concatenating the above two

so that creates a mixed vector which has

two different kind of types here so i

can do a mixed vector by using the c

function and then passing in my numbers

which has numeric types and letters

which has character types

and then we can basically do a printing

of this which shows me the value but

here what we see is coercion that is

basically

casting if you would know as the word in

different programming languages so it

basically coerces the numbers to

character as characters cannot be

coerced into numbers

and then you can print the values of

this mixed vector where everything is of

character types so for example

at this point of time if i would have

done something like class

of mixed

vector

and if i would want to look into the

values of this one it shows me

everything is of character types here

now

data type of different vectors can be

returned by the function class as we saw

just now so it is common to use the

class function

to integrate an object

asking what is the class

now we can create one dimensional object

such as an integer vector which we have

done earlier and then look at the class

of it which tells me it is an integer i

can also create a numeric vector

by giving in some values here

so when we do this

so i have given the vector function c

and then giving in the value and look at

the class it shows me it would have

numeric values

now you can create a character vector

and then basically look at the values of

it now at any point of time in all of

these for example if i would do num

i can see what are the values assigned

to it

i can do letters

and i can see the values of this so let

me just create some space here now i can

create a factor vector

and then look at the values of it or

also you can see

what is the value in this factor vector

so here

we said as

dot

factor so factor function is being used

here and we are creating a vector of

letters

and then we look at the class

we also look at the values what are

assigned to this

or what are in this particular vector so

if you look into all of these vector

examples

initially we were

using an assignment operator where we

were using the c function and when we

started creating vectors by say

concatenating or vectors of particular

types we are using equals here and that

also is fine now looking further

when we look at concatenating two

different kind of vectors so for example

here we have

say

numbers and letters

as we discussed earlier

it will do coercion that is change

one type

into other

now when we talk about one dimensional

objects we can have integer vectors

or say float which we saw just now

ending at

10.5 so when we say c 1 is to 10

it basically starts with 1 but then

there is also you can say a question

happening here and then you have the

values ending at 10.5 that is float

and i can look at the class of it

and when we did a class of

did we do a class here so let's come

here and let's do a class of this one it

saves me it is numeric you can look at

the values of it

similarly you can create a character

vector which is

1 to 10 and then basically look at the

class of it or basically the value of

this vector

or as we did the factor vector now for

two dimensionals we will explore that

when we are learning about matrix so as

of now let's forget that now when you

talk about mixing objects there are

occasions when

classes of our objects get mixed

together so that could be accidentally

or that could be intentional so if you

look at this example here we have y

which has been given values which is 1.7

and a

and at this stage if i would look at the

value of y

that's my vector

if you look at the class of y

that shows me

it is

as character now when you look at some

other examples so let's pass in

logical and numeric values

what would happen in this case so we can

again

use

class

of y

and that basically has numeric

and if you would want to look at the

value of y that shows me

1 and 2 here

let's go further

so

let's look at the value of this one so y

and then basically see what is the value

of y so it is a

true

and you can also look at the class of it

now we are mixing objects of two

different classes in a vector remember

when we talk about

vector we always talk about vector

having elements of same type

but when we talk about lists which we

will learn later

that would have

basically or that can have

your each element of different type

so for vectors it is not allowed so when

different objects are mixed in a vector

coefficient occurs so that every element

in the vector is

of the same class

now

we have seen earlier the implicit

coercion where our r tries to find

a way to represent all the objects or

elements as i say so all the objects in

the vector in a reasonable fashion so we

can also be doing explicit coercion

so that is

from one class to another by using a as

dot and then using a relevant function

so if i have x here now if i look at the

class of x it tells me it is an integer

but i can convert that to numeric by

doing a as dot numeric or as dot logical

or as dot character to basically do a

coercion and change the class of the

objects now if r cannot figure out how

to coerce an object this will result in

nas being produced which we can also

relate to missing values or not

applicable values so for example if we

create x and look at the class of x it

tells me this character let's try

changing character to numeric which will

not work and it says n a's are

introduced if you do it even in logical

that would not work and it shows me any

values or if you do a complex it says

values have been introduced so at this

point of time if i look at the value of

x it tells me

it was assigned a b c and we try to

convert that into a different class

now when we talk about vectors it is

also good to know about attributes in

brief

so

all your r objects have attributes that

is metadata for object so when you talk

about our object attributes you could

have names you can have dimension names

you can have the dimensions that is

matrices and arrays you can look at the

classes such as integer numeric and so

on and you can also look at length which

is user defined attribute so if i say x

we are assigning a value to x now at

this point of time if i see my value to

x is 1 but then all objects need not

necessarily have attributes so in that

case whenever you try to use an

attributes function

that would return null so

at this point of time if i look at the

attributes of x

it shows me null value so these are some

of the basics which

help us in working with r

and using your vector function

or

looking at the coefficient which is

implicitly happening or explicitly can

be done by us by using a as dot sum

function

now let's learn about

lists

and how we can work

using r on list

when we talk about vector which we saw

in previous examples

vector is a one-dimensional array right

and it can hold elements only of same

type so we would say vector is more of

one dimensional but when you talk about

list list is a generic vector that can

contain objects of different types

so when you talk about say for example

matrices matrices can also hold elements

of same type but

in matrices it is a two-dimensional

array we will talk about matrices also

we will learn so when you talk about

lists they can contain all kind of r

objects so you can have dates you can

have data frames you can have vectors

and many more so in list

there is no coercion which is required

that is changing of data type there is

no loss of functionality and lists do

not follow any predefined structure now

we can create lists using this list

function as it is shown here so you can

create a variable and then assign a list

to it where you can be

using either passing in a vector or what

you can do is you can simply create a

list by using this

list function so let's see some example

here now for that what we can do is i

can bring up my r studio

where we can see an example on list and

how it works so when you talk about list

what you can do here is

let me close this one

and this one yeah

so what we can do is we can basically

say for example test

and i can basically

give

something here so for example i can say

music tracks

and then i can say how many hundred of

them

and i can say

let's give 100 as number

and then we can say how many of them got

five stars

and

i can do this so i can

check this and this shows me all the

objects or elements of this

list right

now

when we do this what we are doing is we

are creating a vector right and vector

basically

can have question depending on what are

the elements which are passed because

whenever you use the c

and you create a vector it will only

accept

elements of the same type so for example

if i do a class on test

it shows me

here all the objects are of type

character right and you can also use

type off

to check

for

our test variable and

it is basically having all the objects

as character now how would you create a

list so what we can do is we can use a

list function so for example let's again

do a test here but this time i'm

interested in creating a list

and list can have objects of different

types so let's say music

tracks and then i can just give hundred

and i can say with rating five

and now if i look at my test it shows me

all the elements of your particular list

here

we see each element or each object with

a double bracket and we can see each

element

now what we can also do is we can use is

list

function and then we can pass and test

here to check what is it and it is a

list right

so here we have created a list but if

for example we take the previous example

where we were creating a vector

and if i would do a is list it would

show me false right so we just created a

simple list

and we can also arrange labels

or

we can

use a name function to basically give

names so what i can do here is let me

create a list first

so i can do that like this

and now what i can do is i can do a name

and i can use a name function to this

test

and then basically what i can do is i

can pass labels

so here i can just

given some names here

so for example i can say

let's give it a name

product

so say we are talking about product of a

company

and then we can say

here i can give

count

and here i can give rating

and this is basically two given names so

let's just give

some error here let me just check this

so let's use this

name function here and what i'll do is

i will basically

use names

and now let's

do a test

so that shows me the names what we have

assigned

to

our

list objects

now

we can always access the elements of our

objects from a list

using indices or even using double

square so for example i have test here

and basically i can give something like

this

which gives me based on the indices

the position

where you are

accessing the elements of the list

so we can do this

what i can also do is

we can specify names when creating a

particular list so for example what i

can also do is

i can say

product

dot

category

and now i can just give

list function

so i would want to assign names while

creating a list

so i can say for example

product

and this would be say

[Music]

music

tracks

then i can give say for example count

and count would be hundred

and then ratings

and i can say five

and

now we can basically access this

list which we have created

so what we have done here unlike

earlier when we created a list and then

basically

used names function to assign a name to

it or each object here while creating a

list itself

we passed in the names so we can also do

that

now

if you would want to basically

list display the list

or

a compactly display structure of a list

we can always use the string function

and here i can pass in the name

so let's choose this one

and this is

in a more compact way listing down the

elements of your list so list can be

containing other lists also and we can

also do that so for example i create one

more list for example i can say similar

product

and here i can give

a list again

and what i would want to do is

i would want to say

product

equals

and i can say film

and then i can basically give a count

and then i can give ratings

say 4

and here what i've done is i have just

created one more list

but my intention is not just to create a

list but i would want to add this

to

our existing list so what we can do here

is

we can take our previous list that is

product dot category

like what we did earlier

and now i intend to

say list

and here

i would want to

say for example

let's copy this

or we can just

so this is what we were doing when we

were creating a list using product

giving the names while creating a list

and what i also want to do is here i

will just say

similar

and then pass in

similar dot prod

so now if you look at

our list we have just added new elements

so this is one more way where we can

create a list and we can

basically add or our list can have other

list

so when we talk about subsetting or

extending list

so one of the main ways as i said to

access a specific element or a subset we

use double brackets and we can always do

that so for example we take

our prod dot category and then i would

want to access a particular element so i

can always do this by giving the index

positions

and i can access the elements of my list

so this is one single way now here

if we use a single bracket instead of

double bracket

then in that case we will the output

would be a list

so if i look at this one then this would

be a list but if you use double brackets

then you are accessing

a particular object

if we were creating a vector we could

just be using a subset by using the c

function

now what we can also do is we can subset

by names or even logical

so what we can do here is we can take

this product category and if we have

defined names then in that case what i

can do is i can say i would be

interested in music tracks

and this is the name we had given

so we can close this one and we can try

accessing the elements here so we what's

the name we had given

so

[Music]

no it's not

music tracks that's the value the name

is product

so we do this and then we can access the

elements what we can also do is

we can be

subsetting based on logicals so what we

can do is we can basically just give

something like this and here we can pass

in values

something like this

and we missed a bracket

so

that's also a way of pulling out the

values so you can be doing a subsetting

using the names which you have assigned

to objects within your list

or you can say names which you have

assigned to the elements or by using

logicals now what we can also do is we

can use the dollar function now if you

see here we are looking at the name

and that is preceded by dollar

so we can always pull out the values

from our list

by giving the list name and then give a

dollar symbol and then choose

the name for example if i choose product

i can list the values here

i can be

looking at say

dollar and then choose a count and this

is also one way of accessing your

elements from the list using your dollar

symbol now to add elements to a list as

i said you can add a vector of names

and that can be passed to your list

so these are different ways in which you

can work with list and then you can

access the elements either using indices

or using names or even using dollar

symbol and pointing the right names so

this is one simple example of working

with your list now

one more and now i can just do a ctrl l

and i can clear that off so

your list always remember is a generic

vector that can contain objects of

different types

now when we talk about matrices

now matrix is a collection of data

elements

arranged in two dimensional rectangular

layout so we can use matrix function to

create a matrix as shown here

so matrix is

two dimensional now we already know that

vector is one dimensional array of data

elements or a sequence of data elements

but when we talk about matrix it's a

collection of data elements that is

two-dimensional arranged in fixed number

of rows and columns so here you see that

we are creating a matrix and we have

specified the number of rows is 3 number

of columns is 3 and we want it to be

arranged by row where we have given the

value as true

so always remember matrix is 2

dimensional and matrix can have only one

atomic vector type unlike your list it's

a natural extension of vector going from

one dimension to two dimensions so

matrix actually needs a vector

which contains values that you place in

a matrix and at least one matrix

dimension so we can choose to specify

the number of rows or number of columns

when we are creating matrix so let's see

a quick example of working with matrix

so for example i could just say

matrix which will have values 1 2 6

and then i can basically give n

row

and you can give a value to this one and

that's my matrix similarly you could

also be giving

n columns

so i can just say end call

and i can choose this one and then pass

in the value so that's a matrix where r

fills values column by column now if you

intend to fill up matrix in a row wise

fashion so that your values 1 2 and 3

are in first row then we have to just

modify this in a little bit different

way so we have to say matrix

1 colon 6

and row is 2

and then i can give by

row

so you always have these helper

functions which allow you to

put out the values

so for example i do this

and then i can do a control enter so now

if you see

you have the values 1 2 and 3

in your first row

so when we pass a matrix function to a

vector

that is too short to fill up an entire

matrix

then something different happens we can

have a look at this

so say you pass a vector containing

value 1 to 3 to the matrix function

and say explicitly you want a matrix

with two rows and three columns how do

we do that so for example i can say

matrix

and here i can say one is to three

now i can give n row

and then i can give the number of rows

which we want is 2 and then i say n

column

and this one i'll say 3

so i can do this and here what i have

done is i have given the values 1 to 3

i have said number of rows is 2

and your number of columns is 3. so here

r fills the matrix column by column and

simply repeats the vector

now if you want to fill using a four

element vector in a six element matrix

in that case

obviously r will generate a warning

message now apart from the simple matrix

function which we are seeing

you also have

some functions such as r bind

and c bind

which are offers when you are working

with matrices so we can use those so for

example i could say c bind i could say 1

colon 3 and then i can say 1 colon 3

and that's

my c bind that is column bind where i'm

passing the values 1 to 3 and which are

stacked in a in columns i can also do r

bind

and similarly we can be passing in the

values so i can say

r bind and that basically arranges the

values row wise

so be creating a variable for example

let's say n

and let me create

a matrix here

so i'll say matrix now that will contain

1 to 6

and i can say by row

and then you can give

value which is true

and then i can basically say the number

of rows

is going to be 2

and this is also fine so let's look at

the value of n here so you basically

created a matrix with one two six

you arrange them row wise and the number

of rows what you have chosen is two

so what we can also do is we can use our

bind and we can add values to it so for

example if i want to add value 7 to 9

what i can simply do is i can do a r

bind i can say i would want to edit my n

and then pass in the values so i can

just do this and this has basically

appended or added values to existing

matrix so similarly you could have done

a column bind and you could have added

values

to your existing matrix so for example

if i

take this one

and look at my n and what i could do is

i could do a c bind

and then i can basically take my

n

and then pass in values to this one so

let's say 10 and 11

and basically i've added 10 11 as a

column to my existing matrix so this is

one simple way where you work with a

matrix and you are

appending the values either at a row

level or at a column level

so let's also look at some other

examples so basically if you would want

to work with matrix one of the useful

things would be naming the matrix that

is in case of matrices we can assign

names to either the columns or the rows

if you don't do it we see the default

values here which follows a numbering

but what we can also do is we can use

two functions here one is row

names

or you can use

column names so these are the two

functions which can be used so for

example let's do a control l

let's try to get our n

and this is what we are doing here but

what we would want to do is we would

want to give them some names so for

example i'll say row names

and then

i will basically

pass in a vector which has

row names or vector which has column

names so what i can do here is i can say

i would want to give row names to n

and then

i basically give some value so for

example let's say

row one

and then let's say row two

and

now i can look at my n which has the row

names assigned to my rows similarly i

could have also given column names

so all i need to do here is i need to

say

column one

and then i will say column two

and then i can be using column names

and

let's look at this one so what went

wrong here so we have three columns here

we forgot that so we have to add one

more column name and then it should be

5. so now if you look at this one we

have just given row names and column

names so

naming the columns or rows in your

matrices can be very useful now as the

previous error says there is also a

function called dim names

and that's basically an argument

of matrix function which can be used so

we could also do something like this so

for example i have

dim names

so let's have r n

and then what you can do is you can do a

dim names

which you can then just create a list

and in this one you can pass in a vector

for row one

and then vector for row 2

and what we can do here is once we have

given this

let's give a comma here

and then give c

and then give your column names which is

column one

column two

and then basically

column three

and now if you just look at dim names so

you can just see that you have given

some row names and column names and this

can be used basically to

assign to your list

so if you try to store different objects

in a matrix what would happen coercion

would take place right so for example if

i have x

and let's basically

try to create a matrix which will have 1

to 8

and let's say the number of columns

is going to be 2

so let's look at our x and this has the

values now what if i create

say l

and then basically

i will create a matrix which will be

a matrix of letters

so let's say letters

and then here with letters i'll

say 1 colon 6.

now i would want to give the number of

rows

and let's give it say four

and let's say number of columns

and let's give it three

and now let's look at the value of l so

it has letters

and x is having numbers and what if we

bind them together using

c bind which is for column wise binding

so for example if i do a c bind and then

pass in

my x comma l

so if you see here there is a question

which has happened where everything is

converted into character so you can

always do a class and you can check so

this is a simple example of working with

matrices there are much more you can do

subsetting like what we saw in list but

that we can learn later

now let's learn about data frames

and what is the data frame and how do

you use r to work with data frame now

data frame is used to store the data in

the form of a table and for this

we have a function data dot frame to

create a data frame

so what we know already is that data

sets are comprised of observations

or what we call as instances or

variables and we always have

observations

to which

some variables are associated for

example we can talk about

data sets of

say five people now let's look at the

information here

here we look at the body mass index bmi

where we are using a data dot frame

function and then we are passing in say

gender so we use the c function to pass

in the values and then you have height

and then you have weight

and age

and these things then become the columns

of your data frame so for example if we

would want to work

on creating a data frame for people

where

let's say each person is an instance

and properties about each person such as

name

age child

or if the person has a child would

become the variables so

if we have such kind of information we

cannot easily store that in matrix or

list

now data frames can be used for such

cases

now it's a fundamentals data structure

to store data sets pretty similar to

matrix as it has rows and columns and

here

rows correspond to observations now here

we can talk about in every individual or

every person

columns correspond to variables that is

properties for each person

now difference

between your data frame and matrix is

that data frames can contain elements of

different data types

so for example we can have one column

being character

other being numeric and yet another

being logical or numeric

so restriction is that elements in one

column should be of the same data type

now how do we work with data frames

let's see some examples so what we can

do is we can bring up our r

so when we talk about data frames

usually we don't create data frames by

ourself we import data from data sources

such as csv file

or rdbms

or even your excel or spss and then we

create data frames now of course r has

ways to manually create data frames

using data dot frame function

so we can create three vectors first and

then we can pass in those vectors to

create our data frame so let's do that

so let's say name

and here i will use the assignment

operator which we have learnt earlier

and then i'll use c and then i can give

some names here so let's say john

and let's say peter

let's say patrick

and let's say

julie

and let's also give one more name

so let's say

bob

so this is the vector which we are

creating and we can check

this is the vector which we have created

now obviously you can do a class

and you can check what is this

and that says it is a vector of

character

now similarly we can create one more

vector which is age

and let's give some numbers here so for

example let's say 28

and 30 31

38

35

and these are the values for the age so

age is also created similarly we can say

if each person has children so we can

say

children and then i'll create one more

vector and here i'll give values which

are logicals

i'm not going to give any numerics or

character but i'm using logicals here so

if a particular person has children or

no

so let's have this

vector created

and now we have three vectors that is

name age and children and we can use

this to create our data frame so we can

just call our data frame as df

and what we can do is we can use data

dot frame function

and then what we can do is we can pass

our vectors within this such as name

age

children

and that should create my data frame

let's have a look at this and this shows

me

that the data frame is created now

column names are inferred from variables

which are passed to data dot frame

function so the variables which we have

passed to our data dot frame function is

name age and children and those become

the column headings for my data frame

now what we could have also done is we

could have created it in a different way

so i could have said df

and then i could have used my data dot

frame function

and in data dot frame function i could

have said name is going to be name

age

would be age

and then i could say

children

could be

children

and i could do this and this is also one

more way where i'm creating a data frame

and in this way

we can now have

rows of data frames

like in matrix so this is also one way

of creating a data frame to look into

the data frame structure we can always

use str and then we can pass our data

frame and this basically prints out

similar to that of list

so

we also need to know that under the hood

data frame is a list and in this case

this is a list with three elements

so each list element is a vector of

length phi corresponding to the number

of observations

if we create data frame with vectors not

of same length we would get an error

now here when we look at our data frame

we know that name is a column so name

column which is character

is actually a factor instead of

character

to suppress this behavior we can always

use a property that is strings as

factors equals false so what i can do is

i can do a data frame

like this

use my data dot frame function

and then basically we can pass in our

vectors that is name

age

and

children

and then what i can do is i can say

strings as factors and set this value to

false

so if i do this and now if i look at my

data frame structure

sorry

here

now let's look at this one and this one

shows me that

unlike your earlier one now we are

creating a data frame where our name

would be containing characters

there also by default it was showing as

character usually if you because this

value by default is set to false

or it would have created characters or

factors as we say now how do we do a

subset and extend and sort data frames

in r

so as we have learned so far in brief

about your data frames

so

data frame is somewhere like an

intersection between matrices and lists

so if you would want to subset a data

frame we can always use

the square brackets and in that we can

use the single square brackets which are

from matrices or we can use

double square brackets

from list or we can also use the dollar

symbol

so that all these things can be used to

subset the data frame so let's use our

data frame which contains information

about people

so we can select single element from our

data frame so here

what we can do is

we can just say df and then i can use a

single bracket and i can just do a three

comma two

so it would be good if we can first

print the data frame and that's my value

and now let's do a single bracket and

let's look at this one so this tells me

that

we are

using the row index first which is

number three

which shows me that we would be going to

the row number three and then we point

or pass in our column index that is

number two

so we could have done it in a different

way also so we could have done df

and then give it row index and then give

the column name which you are interested

in looking at and that also gives me the

value so just like matrices we can

choose to omit one of two indices to end

up with entire row or entire column

and for example if we would be

interested in looking for information

for patrick what i could have done is i

could have just add

df3 comma and this is showing me the

entire row

now always remember whatever results we

see here that is

giving me a data frame with a single

observation because there has to be a

way to store different data types and

that's why the result is also a data

frame

what we can also do is to get entire age

column we can just use our data frame

and then

we can pass in the column name here

like this and that gives me

just the column now here

the point to notice is result is a

vector because columns contain elements

of the same type

in previous example we were seeing a row

and in that row was

not a vector it was a data frame because

values were of different data types now

subsetting a data frame that results in

a data frame

and contains multiple observations can

also be done by doing something like

this for example i will do df

and then i will say

let me get

3 comma 5

and

then i can just say

age and children for example

so let's say age

and

children

and i can be pulling out the values in

this way

so i could also be

just getting

the results in the age column if i'm

interested in by just saying df

and here i can just pass in the column

number and that also gives me the h

column

now we know data frame is a list

containing vectors of same length

this means we can use list syntax to

select elements also and

what we can do is we can use our dollar

symbol and then choose the column name

and this is also one way wherein you can

pull out the values or you can use

double brackets as i mentioned earlier

and pass in the column name

so that's also fine

or you can give a column number

and that also would work and in all

these cases result is a vector

now with single brackets you can still

do it always remember if you use single

brackets then that will result in a data

frame

the result

can be a data frame here

but

what we are seeing here is

a list which contains only age column

having the data elements

so these are different ways in which you

can do a subsetting of a data frame now

using single brackets or double brackets

can have serious consequences so we need

to always think about what we are

dealing with and how are we handling it

now what we can also do is we can extend

our data frames that is we can add

variables we can add

columns that is adding variables or we

can add rows which are nothing but

observations

so adding columns is like adding new

elements to the list and for which we

can obviously use dollar or double

brackets say for example now this is my

data frame and if we would want to add

height

whose information is in a vector so

let's say height

let's create a vector here and this one

is what i would want to add for each

person so let me do this

and let me pass in some values here

and

the last one

something like this so this is a vector

created now what i can do is

so we are data frame is called df so we

can say df dollar

height

and then i will pass in this vector here

and now if i look at my data frame you

see

the fourth column has been added and

that's my height column

now what i can do is

i could have done it in a different way

basically

if i had my data frame i could have just

done df double brackets and then give it

a name

and then

i could have passed my vector in this

way however so this is also one way of

doing it we have already added the

column so we don't need to repeat the

step now what we can also do is we can

use

a

c bind function and if you remember c

bind that is for column binding so for

example let's create a weight vector now

and let's pass in some values here so

for example let's say 75

65 54 34

78

and these are my values of weight now

what i can do is i can just do a c bind

and then pass in my data frame and then

pass in this vector

and in this way i'm just adding columns

or i'm extending my data frames by

adding more columns to it now obviously

if we can use c bind then we can also

use r bind to add new rows

so

for r bind creating a new vector won't

work

because we need to create a new data

frame with one single observation

remember row

will have values of different data types

so we cannot create a vector we have to

create a new data frame and then we can

add it using our bind

so let me create a data frame here for

example let's call data frame as storm

and let's pass in some values here so i

will say data dot frame function

and then let's give name

what we can do is we can give age

then we can give the logical value

then we can give say height

and since we have added weight let's

also add weight

and this is my data frame now we can use

our bind function so i can say r bind

and then i can pass in my data frame and

this new data frame which we have

created

and

this tells me

that

the number of columns of arguments do

not match so we will have to check this

one

so we

have

our data frame which has just height so

it does not have the weight

that was only as the result of c bind so

let's create the storm again without

weight

and now let's do a r bind

and let's again check what is the reason

here

so this is height

and let me just check this

so to look at this

this is the error we were getting

because i was creating a data frame with

four columns and then i was trying to

add that to a data frame which had three

columns now yes we had done a c bind and

c bind was showing us the fourth or

fifth column but the original data frame

only had three columns

so what i did here was i did tom

and then basically

i

created a data frame with three columns

which matches with my original data

frame

which had three columns and then i could

use r bind to basically add one more row

so what we did was we used r bind and r

bind was used to add a new row to our

data frame

now when it comes to sorting or ordering

your data frame say for example we want

to sort data frame by age

now how do we do that so we could easily

do sort

df

and then select

our column and we could just do a

sorting now if we do this it is good but

not really what we need

now other clear way of doing that would

be using ranks so for example if i do a

ranks and instead of doing a sort i

would use order

and then basically pass in my column so

i would say df

and then i would use h

now in this case

if i look at ranks it shows me

a vector of ranks with rank position of

each element now if i do a df dollar age

it shows me

the values and if you look at the ranks

it will tell

21 or here the lowest value

is are 28

and that's the lowest value and that's

why we see as rank as one and so on we

can look at the ranks so what we can

also do is we can just do a df

and then basically use ranks and we can

just look at the result so this shows

data frame which is

a ordered data frame now based on ranks

now if we would want to do it in a

descending order what we could also do

is we could do a df

and then use order

and within order i will basically pass

my data frame i will choose my column

and then i could say decreasing

equals true

and i could do this

and here this could show me the value so

it says undefined column names so what i

would have to check is what is my data

frame here so we have h

and then what we would have to do is we

would have to select a particular column

so let's do that

and here i have just selected the column

and then there is a comma missing that

was showing an error so now we can have

the data

ordered in a descending way so there are

dozens of packages such as d plier data

table which can help you

manipulate filter merge and sort your

data frames so this is in brief about

the data frames

working with data frames subsetting them

and also sorting the data in your data

frames

now one more important

type of object in your r is vector and

that really helps us in

various ways so let's see how we work on

vectors here so to create a vector we

can use the c function and pass in the

values those will be the objects or

elements within the vector

and then you can look at the value of

the vector or also at the class of it

which tells me the values are numeric

now in case of vector all the values

have to be of the same type

or belong to the same class we can say

so here we are creating a vector looking

at the value of it and then looking at

the class which says the values passed

in here are character

similarly we can do it for numerics

that is true false and then look at the

value of this and this class is logical

now what we can also do is we can print

all the three vectors at once and here

we will use semicolon to separate two or

more variables

and we can pull out the values of all

the vectors which see we see here

now what happens if

we pass in the values which belong to

different classes or you can say

different data types so within a vector

if you do that there is something called

as coercion which takes place which will

convert all the values into one type and

in this case it has converted everything

into character

similarly

we can pass in values wherein we can

pass logical and numeric and in this

case it's not going to go for character

it is going to convert everything into

numeric

now if i had done this where i passed a

character and numeric

and if you look at this then it has

converted everything into character so

character always takes a precedence if

it is one of the values of vector

and you have other values which are not

characters then in that case coercion

will happen

there is one more way of creating a

vector and that is by providing a range

to your c function so we can do that

here

wherein i said c

1 colon 20 and then basically look at

the value of vector 7 so it shows me all

the values starting from 1 till 20

however there is one more way you can

use the sequence function to do the same

thing

now

i could avoid the bracket i could avoid

the c function and i can straight away

pass a range and that is also fine to

create our vector starting from 1 ending

at 25

so what if i want to create a vector

with odd values between 1 to 20. now in

this case i am going to say how many

values to skip or to jump so i'm

creating a variable called odd value i'm

using sequence function

and then to that i'm passing the

beginning number the ending number and

then the skip or the jump

and now if you look at the values it

shows me only the odd values

well you could have done the same thing

to get even values and that's not very

complicated so you can start from 2

and then you can do skip wherein

after 2 it basically gives you

every second value so we are looking at

the even values and this is how you can

create a vector which is having odd or

even values

now what if you want to create a vector

with 10 odd values starting from 10 so

you are basically giving a length so

here you can say from where you would

want to start

what is your skip and then the length of

the vector which tells me it gives me 10

odd values

beginning from 20

or from 20 onwards that is we take it

from 21. now one of the

requirements is always to name the

values so that we can access the values

either by indexing or by their name

which have been passed to the value so

let's see that so let's create a vector

which is called temperature so variable

is temperature pass in the values to

this

look at the values of temperature now

what we would want to do is we would

want to assign these names to each value

which makes it more readable more

accessible

so i can use the names function

pass in my temperature as a vector to

names function and then assign the names

to

each value of temperature now if you

look at temperature it shows me the

names which have been assigned well we

could have done it in a different way we

could have created a vector of names

something like this

and then

what i could have done is i could have

created one more vector such as

temperature and instead of assigning

values we could have assigned the vector

to our

existing vector so if you do this so you

are assigning the names vector to the

temperature 1 and now look at the values

it still does the same thing

so this is where you are assigning names

to every value

of your existing vector

now there is one more way and that is

using your sequence so here i am

creating a sequence which starts with

100 and set to 2020 with a skip of

20 values

or every jump would be 20 values so

let's do that

use your names function on price

and then what i'm going to do is i'm

going to use my paste 0 option

which takes p

and then

1 to 7 as the values so we know base 0

basically skips the space

and we are going to assign those values

to

as names to price

and now let's look at our price so that

basically gives me the names as we

desire so these are some smarter ways of

assigning names to

every element or every object within

your vector

now how do we perform some basic

operations let's have a look

so let's create a vector passing in the

values

and then you can simply do

an addition on two vectors where each

element is getting added to other

element of the vector

you can

subtract two vectors that is element to

element subtraction

element to element multiplication or

division

and you can basically perform operations

on the vectors now how do we use some

inbuilt basic math functions and that's

pretty easy

this is my vector now let's do a sum

which sums up all the elements let's

find out a standard deviation for all

the values let's find out the variance

for all the values here let's do a

product of vector values find the

maximum or find the minimum value so

these are some basic inbuilt math

functions which sometimes are useful in

our data science or data analysis kind

of activities

now one more requirement might be

comparing the vectors

using comparison operators

and this is where i create a vector 1

create a vector 2 and let's find out the

values in v1 which are smaller than v2

values and that gives me the logicals as

the response that is false true and

false

similarly you can do v1 greater than v2

or you can say where v1 values are not

equal to v2

or equal to v2 so these are some simple

comparison examples now i can create a

different vector and then i can find out

individually if the elements in the

vector are lesser than 3 by just doing a

v

lesser than 3 so it compares each

element with this so you are actually

using one scalar value to compare it

with all the elements and you can do

that it gives you the logicals so you

can

also

be doing slicing and indexing on vectors

and this is very much important when you

are storing your data in vectors how do

you access them so let's create a vector

using sequence

let's give it some names as we have seen

in past

and let's look at our price one so that

tells me the name and the values now

you can access the elements using

indexing so let's get the third element

and it shows me 590. remember the

indexing here starts with one unlike

other programming languages like python

where indexing starts with zero now i

can also get the third and fourth value

by doing a three colon four i can also

specify the vector

and say one comma four and that shows me

the first and the fourth position or

second or sixth position so this is one

way where you are using indexing to

access the elements

similarly i can give the names now

that's where we see the benefit of

giving names to every element so i can

use c

function pass in the name and look at

the value

for that particular name or selectively

select different columns or different

names

or we can also use this square bracket

wherein we pass the names

so sometimes it can also be useful to

use logical positioning that is we would

want to

find out the logical position if the

value exists and we can do that

or using true and false and then look at

the values

so

there is one

useful

way where you can exclude a particular

position might be that is an n a value

might be a value which you are not

interested in and that's where you will

say minus 2

which will skip the p 2 value or minus 2

n minus 5

where we are skipping a p 2 and p 5

and we can exclude particular values

from our vector

now how do we do a comparison operator

on the values of vector so you can just

say price 1 and i would want all the

values which are greater than 600

or you can assign this to a filter and

then basically

pass in the filter for your

vector

so these are some simple basic

operations which you can run

using your r programming where you would

want to manipulate where you would want

to store some data and extract that data

use your different logical operators or

other operators and perform your basic

easy computations

now that we have seen some basic

operations using r let's look at some

more

operations when you're working with

vectors such as one of the common issues

is handling the missing values now here

we are

assigning a vector

to a

variable order detail

and this one has a missing value now

let's see how this is handled and you

see all the values in the vector are

assigned what you can also do is you can

assign names

as we have seen earlier by using the

names function

and then look at the value of order

detail so you see the names and these

are your missing values which are also

taken care now what we can also do is we

can perform an operation on a particular

vector which will be applied to all

values of the vector so for example here

i will just add a scalar value plus 5 to

the elements in the vector

and that shows me number five has been

added to each element or each object in

the vector

now if you would want to work on two

vectors for example to add two vectors

let's create a vector called new order

and then

let's add it to order detail now in this

case

what we are doing is we have a vector

which is from 5 to 10

and what we are doing is we are adding

values to order detail now our order

detail earlier was 10 20 30 n a 50 and

60

and what i have done is i have passed in

a vector which is 5 and 10 and you are

adding it to the elements so 5 gets

added to 10

and then your value 10 gets added to 20

and then you have again 5 which is added

to 30 now you cannot add in anything to

a missing value so that remains as it is

then you add again 5 to 50 and then 10

is added to 60. so in this way you are

adding two vectors which are not of same

length but you are adding these values

now what i can also do is

i can update the order

by doing this

so i'm creating an update order

and now let's look at the value of

update order what does it show

so you are basically doing the same

thing

so

if you would want to work on a subset of

vector how do you do that so here you

are using some indexes so i'm saying

order detail

and this is my order detail

so let's take one colon two and assign

it to first two so if we look at the

value which is assigned to first two we

have just sliced and added a subset of

vector to this one and if i would want

to take the length

of order detail it shows me

the length here which is six elements

here

including the missing value also

what we can also do is

we can do some more operations so for

example from order detail what i'm doing

is i'm saying length minus 1

and then

up to the length so let's do this and

let's see the result of this so what we

have done is

we had our order detail which had

these values

and what we have done is we have said

length minus 1

colon length so you have taken these two

elements and you have assigned that to

your v1

similarly we can do length minus 1 and 2

elements so i can do this and now let's

look at the value of v2 so this shows me

the value where you are taking length

minus 1 and then you are taking it till

the second position of the index element

which is 20 so you are getting in the

values here so you get

your

50

n a 30 and 20 because you started with

length minus 2 and up till the second

index position

similarly we can use the length and we

can take it from

this element and let's look at the value

of v3 so that shows me that i'm i'm

doing some slicing or i'm getting subset

of my vector so similarly you can also

do this one so v4

and let's do this and then let's look at

the value of v4

so it gives me the values based on our

subsetting or slicing now you can

extract all the values below 30 and this

is where you are doing a comparison so

you will take your vector and then

you would want to compare each value if

it is less than 30 and you would want to

take all the values here so it gives me

the logicals or the response for all the

values which are lesser than 30

what we can do is

we can also

use the square brackets and do this this

will show me the actual values here we

were just getting the logicals but here

we are getting the values

now to omit any value from the vector we

can use n a dot omit

and this one will help me

in getting rid of the n a values plus i

am also checking the values if they are

less than 30 and then

i am basically doing

using this n a dot omit

so you can do something like this you

can look at the values what you can also

do is you can find the order details

that are multiples of three and here we

would want to use modulus and we would

want to find out if the remainder is

zero then i am getting the numbers which

are

divisible or multiples of 3. so let's do

this

and it gives me again the logical values

of all the values which are divisible by

3 giving us a remainder of 0

or if you would want to look at the

values then you can say order detail

open up a square bracket and then pass

in

your condition

now we can then omit

any from this one and then we can look

at the values

so this is simple way where you are

subsetting a vector or extracting the

values which you are interested in which

might be one of the requirements of

your data wrangling or data manipulation

or just data extraction

now i can also use a sum function

now

if we do this it returns n a because

there is already a missing value and you

cannot do a sum on the values

now what i can do is i can do a n a dot

r m

to remove the n a values

so i can do a sum on order detail where

i intend to add up all the values but

what i also want to do is i want to

remove the n a value so i'm giving it a

value as true and then if i do it it

gives me the sum of all the values so

similarly you can do a mean you can do a

maximum you can find out the minimum

value standard deviation or even square

root now these are some simple

operations what we are doing on vector

where we are interested in extracting

some specific values now let's look at

matrix which we have also discussed and

matrix is also one way where you can use

the matrix function to create a matrix

which is multi-dimensional so for

example if i do this and if i look at

the value of v

i get a matrix which starts with a value

of 20 ends with 30 and at any point of

time you can convert this to matrix so

first we created a vector and now i'll

create a matrix out of it wherein i am

seeing the row numbers i am seeing the

column number and i am seeing the values

in that particular column

so

you have already done that now let's

take it to the next level so let's

create a matrix wherein we are using the

matrix function we will say 0 comma 3

comma 3 and now let's look what it has

done so you have created a matrix which

is of

three columns and three rows and by

default the row number and column

numbers have been assigned to them we

can also create a matrix by passing in

values so we can say 1 colon 9 and then

give the dimensions that is number of

columns is 3 number of rows is 3

and if i look at the matrix now i have

passed in the values to my matrix

sometimes you may want to arrange the

data in a matrix for particular kind of

calculations

you can also use n row and by row

so

you can say how many number of rows you

would want

and you would want to assign the data

row wise so when we are doing this now

if you notice the difference between the

previous one where we just gave the

values and we said three rows and three

columns so it was doing it column wise

so one two three four five six seven

eight nine but here we said by row is

true so it has arrange the values in a

row wise fashion so it goes one two

three four five six and seven eight nine

similarly i could have just done this by

giving the dimension and selecting by

row and if i do this it is still doing

the same thing

now what we can also do is we can create

matrix using vectors

so here let's create

a vector stock one and then stock2

now

we would want to merge both the vectors

so you can always do a c function and

then create a new vector that is stocks

which is emerged result of stock one and

stock two and let's look at the results

so that's my stocks that's a vector and

now what i would want to do is i would

want to create a matrix

using the stocks so i'm giving it a name

that is stock dot matrix i'm using the

matrix function wherein i will pass my

vector

i will say by row so i want the values

to be arranged row wise and i'm also

selecting the number of rows

so if you look at this one so the values

which we had in our stock

which was all the values have now been

arranged row wise and in two rows so it

starts with 450 51 52 45 and 68 that's

my first row and the rest five values

are arranged in the second row so one of

the main requirements is instead of

going for default

column names and default row names we

can give specific names to our columns

and rows to make more sense to the data

how do we do that so we can basically

say days

so this is a vector which we are

creating

and then what we want to do is we want

to create a new variable which is stock

1 and stock 2.

now

this is for my columns and this will be

for my rows now how do we assign that so

we can say column names and this is

where i will say on my stock dot matrix

i will assign days which has five values

and that will become my column names and

similarly using row names function i can

basically assign row names to my

matrix so if i look at my matrix now it

shows me the column names and row names

which we have assigned or which we have

passed to our matrix

now there are different functions which

are associated with the matrix and let's

look at some examples so these are some

simple basic examples now if i say

let me find out the number of rows and

that gives me the number of rows or

number of columns or get a dimension

that is the number of rows and columns

of your matrix

now

we might be just interested in getting

the row names or column names or even

the dimension names which basically will

give me

returns the row and column names

so in this way you can use these symbol

functions which are associated with

matrix to extract information about your

matrix or data which has been

transformed into matrix to pull out some

information about that

one of the requirements

which data scientist or data analyst

might face is

carrying out

arithmetic operations on your matrix now

what we can do is we can create a matrix

which takes values 1 to 50. we want to

arrange it by rows and we will say

number of rows is 5 so that's my values

starting from 1

now i can do a addition

here by just doing a 5 plus mat 1 and if

you notice number 5 as a scalar value

has been added to

every element of the matrix

similarly you can do a multiplication

you can do a division

you can basically return the quotient if

you would want to do that or go for

exponential values so you can perform

simple arithmetic operations

for every element of the matrix

and what if you want to have

arithmetic operations done on multiple

matrix so let's create mat one plus mat

one

and we get a total where every element

is added to every element

you can do a subtraction

you can do a multiplication and you can

get the value so this might be also very

useful when you are working on

multi-dimensional data

you can also

do some more operations on matrix such

as

returns the sum for each column

say you are doing a summation or at a

row level or you want to do a mean for

every row you can do that by using these

simple functions

now

you can add rows and columns to a matrix

using r bind and c bind functions

so r bind is for row bind and c bind is

for column bind but for that we have to

first create a vector so

let me create a vector of same length

which will then be added to

every or added as a row to my existing

matrix now my matrix has five

columns so let's create a vector with

five elements

and then i can basically add this as a

row to my existing matrix

by doing this and now if i look at my

values i will see the new values at this

as the third row

and if you also see

the variable name becomes the row name

and we have added a row to our matrix

now similarly

i can find out row means that is we have

seen earlier by calculating the mean or

average so i can do that

and i can find out the value of average

now what i can do is i have got the

average for every column and

what we can do is we can basically

do a column bind

by using a c bind function

and i will say

i'm going to take the total stock which

has three rows and then get the average

and now let's look at the total stock

which shows me the average value which

is the new column which has been added

to the matrix so these are some simple

very simple operations which you can do

but that gives you good insight in what

can be done at a matrix level where your

data is arranged in multi dimensions

now how do we do a selection and

indexing in matrix so in vectors we were

using either names or we were using

positions or we were using indexing now

here let's create a matrix called

student

and we are using the matrix function but

within the matrix function we are using

the c function to create a vector

which will pass in all the values which

also has n a values if you closely

notice

we will split these values into number

of rows is four

so that means the values the number of

values in this vector should be a

multiple of four i am saying columns is

4 and i would want to arrange this data

row wise so i've done that and if you

would want to get the dimensions out of

this so i can do a dim names so what i'm

doing here is

on my student

i am assigning a list

which will basically have these names

which are basically assigned and now if

you look at your student it basically

shows me

the values which were first

applied to the row names that is john

matthew sam and alice

and then you have one more vector which

goes as the column names for the values

so you have not only created a matrix

by using a vector by defining your

dimensions that is number of rows and

columns you have arranged the data in a

row order and what you have also done is

using a list function you have passed in

the values which will be applied as row

names and column names to your matrix

now how do we extract particular columns

here so we can take our matrix and we

can just say comma 1 and that basically

gives me the values for

john matthew sam and alice and what we

are looking at is the first column

now i can also say from first column

onwards i would want to look at how many

columns so i can do this and now here

i'm selecting first and second column

i can also be

using a vector function here and

that also does the same thing where i'm

saying 1 comma 3 and i'm getting the

values from first and second column so

third is not included here

now if you would want to do row wise

then you have to give the row position

first so if i do a student 1 that gives

me the row values

and this is giving me values for

my student which we are seeing here so

for john we have 20 30 na and 70

and that's what we get here when you do

a row wise operation you can also do a

row wise and how many rows do you want

you can use the vector function to do

that

you can also select or slice out a value

where you are getting an intersection of

row 2 and column 2 and then you can also

start from a particular position and

then onwards get your rows

so these are different ways in which you

are slicing the values from your matrix

by

columns or by rows

so

at this point of time let me just type

in student here and let's look at the

value of student

and then here we are interested in 3

colon 4

and then 2 column 3 so what does that

give me so you are looking at

third to fourth row so you're looking at

sam and alice

and then you are looking at columns two

and three so that basically gives you

your 26 32 24 and a

so first is you are giving your row

positions or how many rows you want and

then you are giving your column so

similarly you can do this you can say

from row number 2 to 4

and then column wise you can say 1 to 3

so if we do this so this tells me two

columns which is first and second and it

shows me rows which is

from second to fourth

so in this way we can extract

data based on rows and columns now if we

would be interested in finding out a

specific value so for example if i again

bring up student

this is my student and what i would be

interested in is getting the value of

john

and for specific subjects

so maybe we are looking for

2 colon 3 now if i do this

it shows me for john

and what we are interested in

is 2 colon 3 so that gives me the value

for chemistry and biology so you are

giving the columns so row wise you have

already specified the name and that

basically selects the particular row i

could have given a number and chosen

which row or which rows we would want to

pull out the values now if i would want

to find out the value for john and sam

now in that case

i could use indexing or positioning but

that has to be continuous but here you

are talking about john and sam which has

matthew in between so we will basically

create

we will get the values for john and sam

and then we will look at

the value 4

now

that is basically giving me the values

in the fourth column which is 70 and 75.

similarly

if you go further you can look at maths

and bioscore of sam and alice

so you will give your

row names that is sam and alice and then

you would want the values for maths and

bio so that is basically your

third and fourth column and we can do

that by looking at the values

how do you find out an average

well

that's pretty simple you can use the

mean function

on student

you

will select your row name that is john

you also want to get rid of n a values

otherwise that will give a problem so

you get rid of that by saying n a dot r

m

equals true and then you

get the average score of john now

how do i do further computation that is

if i want to find out the average and

total score of all students so in this

case

i can apply or i can use an apply

function

here i'm saying i'm working on student

and

we would want to give the row number

that is 1

and we want to also give the column so i

want to find out mean

i want to remove or get rid of the n a

values and now if i

look at help apply it tells me how does

the apply function works over the array

margins so i will do an apply function

on student

where i would want to select the first

row

i would say i want the sum

and i want to get rid of the n a values

so this gives me the sum for each

student

and here we are getting a mean value

which was for

each student

so what we are doing here is for example

let's look at student again just so that

we avoid confusion so we have

student

and then we have physics chemistry bio

maths and i have said

row one so basically what we want is

for john we want

the total

and what we can do here is

we can say 20

plus 30

avoiding n a and then 70 that gives me

120

then you look at matthew so this is

again doing a totaling there is no n a

value and you look at the value

right so when we have chosen apply

function we have worked on student

now here we are interested in the values

that is sum of all the values for this

particular row

i'm saying take care of any and then

give me a sum similarly you did a mean

and that was giving you a mean for each

student

so these are some simple operations now

what we can also do is

we can basically create a vector called

passing score

and what we would want to do is we want

to get the values or find in how many

subjects alice has passed how do we do

that we will have to compare

alice score

which should be greater than

or equal to the passing score so what we

can do is we can create a variable here

pass now i am saying student i would be

interested in the values for alice so

i've mentioned that row name here i'm

then comparing it with passing score

which we have created here and that will

give me the values wherever

alice has passed in a particular subject

now i can obviously get rid of the na

values and then look at this which

basically tells me

there was

one subject in which alice passed and

rest were either false or any

now same thing we can do for sam

so sam is here

and what we want to do is we want to

look at the values here so we will say

let's do the same thing for sam and find

out the comparison with passing score

and get rid of n a values so you are

basically extracting value so these are

some

easier operations and usage of functions

on your matrices

which are filled in with values at row

level and column level and then you can

apply one of these functions

or

multiple functions to basically extract

value which makes more meaning

so that's with your matrix now let's

also look at data frames now data frames

as we know

is basically data which has been ordered

in rows and columns

wherein we can assign row names we can

assign column names we can do some

operations on data frames so let's look

at example so if i do a data

here so that gets me

some sample data sets or functions what

we have here

so let's do

once we have our data here

so it says use data package and then you

can get

list all the data sets in available

packages and you can basically look at

all the r data sets which we are seeing

here it has opened up so i would be

interested in getting the air passengers

data so i'm going to pass that in the

data function

and then if i do a head to see the

initial data from air passengers it

shows me the values what we have

similarly we can do that on iris data

set and look at the head values

i can

do a view to look at specific values in

a tabular format if that makes more

meaning and that makes it easy for

analysis

now i can

do a view on state

x77 and that basically shows me

the population income and all this for

different u.s states so these are some

different data sets what we have

you can do a view on them

to basically understand the data or look

in a more readable format you can just

do a tail to get some end data so head

and tail functions just give you the top

six entries

or basically your entries from that

particular data set now the question is

how do we

work on this data so i can get a

statistical summary so i have the iris

data set which we had here

so if i do a head it shows me iris data

set this is a popular data set which

shows the petal lens sepal length of

particular flowers and the species what

is the length what is the width and what

species does that flower belongs to okay

now here we can get a summary that is

statistical summary of a data set which

gives me mean first quartile median mean

third quartile and maximum values

it basically shows you the count of

the entries for each species what we

have under the species column now what i

can do is i can check the structure of

this data set using str

i can create a data frame now of this

data

using the data.frame function so for

that we need to also have

say for example if we would want to

create a data frame let's see how do we

do that so first we create a vector of

days

we can create a vector of temperatures

and rain

and then we want to create a data frame

out of this so i use the data dot frame

option

i pass in my days temp and rain as the

vectors and now if you look at the data

frame you basically see

that i have my days my temp and rain so

those were the variables those were the

vector names and those have become the

column names row names are auto assigned

and basically we are seeing the values

which have been passed in my data frame

now i can do a summary on this to

basically look at what is the length or

how many values we have in data frame

what is the class of elements so that is

character

you are looking at

your

values or summary which gives you mean

first quartile median mean and so on and

then it also shows you the complete data

on rain

what is the mode here

what how many falls or how many true

values we have you can also look at the

structure of this data frame by doing a

sdr

which gives me

how many objects we have how many

variables we have

what are the different variables so that

is days temperature and rain and the

values for those

for days if you notice it is of the type

character temperature is numeric rain is

logical now how do we do data frame

indexing so

like your matrix which basically has

rows and columns and in multi-dimensions

similarly in data frames also you have

indexing so you can do a data frame so i

could just extract the first row by

doing this and that basically gives me

the value so you can always compare it

by just typing df so that's my data

frame

and now let's look at the values extract

the first row and that shows me monday

25.6

rain value is true

now i can also do it column wise so

for example i could do it in this way so

here what i'm doing is i'm doing

extracting the second row from this one

so it tells me

25.6

30.1

40.0 37.3 so you have extracted the

values for the column right so i would

not say extract the second row you would

say extract the second column

okay so this one should be second column

yeah

now

selecting using column names so that's

the easiest way to extract the values

for a particular column so i can just do

this instead of giving the position of

the column or the column number i'll

give the column name

and that gives me all the values of

temperature

and

if i do this where i'm saying 2 colon 4

and then i'm giving the columns so it

gets me the second

third and fourth rows for day and

temperature

and we are looking at the value so you

have given your row names and then you

have selected your columns you can also

do a dollar sign

if you would want all the values of a

particular column so i can just do a df

dollar days or df dollar rain and it

shows me

the values from my data frame now one

more way of doing that is using your

bracket notation to return a data frame

format of same information so if you

want the resultant data in a data frame

format

you can just do a df rain or df

temperature and that is basically giving

a data frame so if i had assigned this

to a value and if i had look at the type

of this that would be data frame

now

one of the things which we also require

is filtering data frames using a subset

function

so that is subsetting the information

from a data frame so we know we have our

data frame let's look at our data frame

again

so that just reminds of what data values

we have

and here let's get a subset out of it

using the subset function so i'm passing

in my data frame i am saying i would be

interested in the rain column so i am

giving subset rain column and

wherever the values are true so returns

all the columns where it has

rained similarly i can

do a subsetting by giving a value for

temperature wherever the value is

greater than 25 and that shows me the

value so this is where you are filtering

the data in data frames using a

subset function to which you have to

provide a column name

and then giving a condition now

one more important thing which might be

required is sorting your data frame

using order function so i can create a

variable by name sorted dot temp

i want to do a ordering of data frame

and here i am doing ordering based on

temp

and now if i look at the value

or i can create this

in an ascending order

so let's look at the values and now if i

look at my data frame it just gives me

the

order or the ranking for the particular

values

so we have discussed this in other

section also so what i can do is

i can return all the columns with

temperature sorted in a descending order

so right now what we were seeing was we

were seeing in ascending order but what

we can do is we can do that in a

descending order so here i'm creating a

variable descending.temp

i'm doing an ordering but when i'm doing

a ordering i'm using the minus symbol

and this one

if you would look at in the form of a

data frame it shows me the values which

are ordered in a descending order based

on the temperature column

now another way of sorting is by using a

particular column

so what i can do is i can sort i can do

a order and then i can choose the column

based on which i would want to order it

and then

if you would want to get the values of

this so it tells me

the values have been ordered based on

tip

so this can be very useful when you

would want to sort the data or order it

in a particular way to basically

understand your data or to make more

meaning out of it right

similarly one more requirement might be

merging your data frames

so here i'm creating a data frame so i'm

saying authors

and i'm using data.frame function and

what we are doing is

instead of creating three vectors i am

basically doing that within my data

frame function so let's do that

and now what we can do is at this point

of time i can check what my authors look

like so this is my authors

now here if you see we have

the vector turkey venables tierney

ripley and mcneil so that becomes my

first column

which is surname

then you have your nationality and then

you have deceased

where you have also

repeated the values four times right so

that's something new which you might be

seeing so you are creating a vector

where you are passing in a value and for

other set of values you are basically

using a repetitive function

now similarly we can create a data frame

called books

and this one is

where i am

having name column title

and then i have other dot author and you

are passing in the values so at this

point of time if you would want to look

at your books

it would look something like this so you

have given a name now just closely look

at the data frame function so here you

are using

the names

you have the titles whatever values you

are passed in

always remember when you have multiple

vectors they are ending with a comma

right so do not forget that and then you

have other dot author so that's the name

of the column and you are passing in the

values where you have also passed some n

a values

and at this point of time you can look

at authors

this is your books

and our intention will be to merge

these data frames so that's what we

would want to do

might be we are interested in getting

the data together so what i'm doing here

is i'm saying m1 now i want to use the

merge function i pass in my data frames

that is authors and books

so if we closely look at authors it has

three columns and five rows and here you

have three columns and we have seven

rows

so we would want to do a merge so we

will say author's books and we will say

by dot x so this is where i am choosing

which is the column based on which

i would want to merge so i have buy dot

x which is surname

and by dot y which is name

so we would want to merge the data where

we are giving a condition based on

values and surname and name so you see

there is turkey here there is turkey

here we have venables we have venables

we have tierney we have this one we have

ripley which we have here multiple

entries and then you have mcniel

now we don't have our core which is

there in

your author so let's see

what happens when we do emerging here

okay and now we see the result of this

merge where it has taken all the values

from

both the data frames so you have surname

nationality deceased you get the title

you get the other dot author which you

are getting in from your books

and the name column is avoided right

because we are

doing

the merging based on surname and but y

dot name is name so we don't see the

name column but what we are seeing here

is the values which have been merged and

then you can compare so for example

let's do a random check so if i look at

mac nail

that's the surname

or here it was named so you have mcniel

you have a nationality which comes from

the first data frame deceased from the

first data frame

then you have your interactive data

analysis

and then you look at title.author

what you don't look at

in the merge is this r core because this

does not have any value in your author's

data frame so you can do a merging of

your data frames using the merge

function so please try it out and you

can create different data frames and try

to use this

similarly you can manipulate a data

frame so for example here we are

creating one more data frame called

sales report

which is data dot frame you are giving

an id product has some values unit price

is where you are

getting the values as integer and

quantity as integer so now if i look at

my sales report this is the values which

i have let's spend a couple of seconds

to look at this value so id value is 1 0

1

2 1 0 10

product

is a b so that is automatically assigned

unit price is starting

where you say 101

140

184 right so we are using a as dot

integer we are converting it into

integer and basically we are

assigning these values here

for your

unit price and similarly for quantity we

are assigning the values by doing a as

dot integer and then just doing a run if

now once we have done that we have

created a data frame now how do you

transpose what do you mean by transpose

so transpose is when you are changing

your accesses so if i do a transpose on

sales report and if i want to do a view

so you will see

the positions which have changed so you

have all these values so my

row names or row whatever values become

the column headings

and basically your column headings

becomes your row names so that is what

you're achieving by doing a transpose

you can do a head to look at some

initial values

you can do a sorting of this data frame

by using the order function and you can

choose the column

and also the order if you would want to

have it in ascending or deciding or

basically increasing or decreasing

values

you can also choose a particular column

like we are choosing product as a column

and i would want to

take the values of sales report in a

descending order that is unit price

and we can just do ordering of data

frames or sorting the values and data

frame so this is pretty easy please

spend some time in practicing these

things taking these examples

and you will learn more about these

functions you can always try creating

an example at your end and you can try

to look into these

now

what about subsetting the data frame so

when you are saying subsetting the data

frame

let's do a subset function like what we

used earlier

i will say subset dot product a i'm

using the subset function and here i

will get the subset based on the product

value being a

let's look at this and this shows me

only the values where product value

matches a

now extract the rows

for which product is a and your price is

150 so you are still doing a subsetting

you are still passing your data frame

here you will give the product as a

which will tell basically the values for

product and unit price greater than 50

so you're giving some conditions and

look at the values

now if you're only interested in

particular columns so if i say

only the first and the fourth column

product is a

and unit price is 150 so you have to

still use your subset function

pass in your data frame

product will be given as a and unit

price should be greater than 150 but

what i am interested in is the values

from the first and the fourth column and

now if you see it shows me the values

for my fourth column

what we can also do is we can create two

subsets so set a from data frame where

we take the product is being a other one

is being b

and then we can look at the values so

this is just a this is just b and what

we can do is we can combine them or we

can merge them using column bind so when

i say column bind and i'm saying set a

set b so it is basically going to stack

the data frames column wise and if you

do r bind it is going to stack the data

frames row wise

so we can either use

column or we can do a row wise

so this is in one way where you can

merge the previous example where we saw

merging was based on a particular

condition which is met

based on some columns which might have

similar values right and this is where

you are straight away merging the data

frame using column bind and c bind so if

you compare this

with the other merge operation what we

saw here this was where you are

comparing the values

of first data frame

and second data frame and then merging

but here we have just used column mind

and row bind so we are not merging on a

particular condition we are just

tracking them either column wise or row

wise

now

what we can also look at is doing some

aggregate operations this is going

deeper into data frames so

when you use aggregate function you are

passing in your data frame you are

choosing the quantity column

and then

you are basically

using the list function so list function

is going to work on your data frame on

the product column so product column for

your sales report so at this point of

time let's look at sales

report

and let's look at the value here so this

is my sales report

and what we want to do is we want to

aggregate the values on quantity column

but for that i will say i will just take

the product columns

and i will get a sum

wherein i am ignoring the any values

let's look at this

and that gives me an aggregation value

so remember aggregate function

is doing a summing up

now here we are doing a summing up on

your

product

that is sales report product column is

what we have so you are kind of grouping

by based on product so we have two

products here a and b

now what we also want to do is we want

to take the quantity column so that's

why we have given that first and what we

are doing is we are doing a summing up

so we are summing up all the values for

a and all the values for b

and we are seeing that

here if there are any n a values we are

ignoring it so these are some basic

operations on data frames or matrices

subsetting them extracting useful

information

using some inbuilt functions to do

transformation or computation and

extracting some values

now similarly we can also work on lists

now that we have looked at data frames

matrices vectors let's also look at one

more structure and how we work

in r when we have to work on lists

so list

is basically a structure here and what

we are doing is we are creating a list

by using the list function

and here

i am

passing in three vectors you see here

now c

function is being used now in vector we

know that all the elements are of the

same type now let's create a list

wherein we see three vectors which are

of three different

types or objects of three different

types so let's create this list

and now let's look at our list so it

basically has elements

wherein you have values of different

types

we can create a different

list which can also have

sequence elements that is 1 to 10 a

matrix which is of three dimensions and

then also passing a list so this is also

one way of creating a list

let's look at list two and if we look at

the values here list two basically has a

vector which has values one to ten it

has a matrix of three into three

it has a list which has values a

having 10 and b having 20.

so this is how you can create a list

which can have objects of different

types so we can also

use

recursive variable a variable that can

store value of its own type so for that

you have to use a recursive function

something like this so i'm saying is

recursive and then do it on your list

and

we can check if the list basically has a

variable that can store values of its

own type now

one of the main requirements when you're

working with list

is

indexing so i have created a list and

here i can access this elements by using

an index so if i do this this shows me

the matrix what i could have also done

is using the dollar symbol and then

choosing

particular element of the list by doing

a mat which is the name given to our

matrix

or by choosing a name that is vector

so

you can access the elements using

indexing or dollar renovation or giving

the name of a particular element now i

can also work on list and i can get the

third elements second value so we can do

that and that shows me 20 or you could

have done by giving the value 3

that is the third element and within

that you are looking for second element

so i can get the length of the list i

can get the class of the list which

shows me this type list

and

what i can also do is i can convert

vectors into list

so here we are creating a variable price

which is being assigned a vector which

has 10 20 and 30

and now what i want to do is i would

want to convert this vector into list

and for that i'm using the list function

so i am creating a variable called price

list

and then i am saying as dot list so

that's going to convert my vector into

list and now let's look at price list

which shows me

a list

or you can look at price which is a

vector

so that's when you are converting your

vector into list now how do you convert

your list into vector

and that also can be done by doing a

unless function

so i can basically work on price list

wherein we converted vector to list and

i can just do a unlist on that which

will convert my list into a vector

looking at the values of the vector

now sometimes we may want to get the

dimensions so we can use the dimension

function to convert the vectors to a

matrix so that it can have multiple

dimensions

so here we create a vector which has

four values and then i am going to give

a dimension to this so that it is

converted from vector into matrix by

giving dimensions 2 comma 2 and now if

you look at price 1 it has basically

changed into rows and columns of two

into two dimensions so these are some

simple examples of working with list

now when you talk about basic data type

functions

we have seen how you use the assignment

operator

how you get the data type of a

particular variable or the class to

which it belongs

i can assign different values

such as 10.5 so the previous one was

showing me the value numeric

and

now what we would want to do is we want

to assign a value 10.5 look at the class

of it it says numeric type of it shows

double so by default

it belongs to the double class now i can

check if

the values in n1 are numeric and that

shows me true and similarly for n2 and

that shows me 2. so you are using the

numeric function which returns true if

the given value is numeric

similarly we can

have

integer

assigned to a particular variable and

for that either i can do as dot integer

or i can assign a value with capital l

so i can do this and look at the value

of i1

similarly

i2 and look at the values and if i would

want to check if that is an integer

let's look at the values of

i2

which was an integer i1 which was an

integer ni 3

which is an integer

so here we have assigned integer values

to a particular variable now all

integers are numeric but all numerics

are not integers so let's check that so

if i do a is numeric on i 1 which was

assigned as dot integer 10 that shows me

true

if i say is dot integer on i1 so was

that an integer

and if i look at the value it shows me

true

now let's look at the character values

so if we say c1

c2 and look at the class of this it

shows me this of character type

similarly on c2 and you can always

validate that

by using the character function

you can also use some inbuilt functions

such as converting to an upper case or

getting a substring from the starting

till the position what you would want

the elements

i can do a paste function

which basically will give me

the data

combined or you can say concatenated you

can also use a paste 0 which we know

will get rid of the space

and it just

concatenates them without a space i can

also use a specific separator which we

have seen examples and we can do that

and what we can also do is we can

replace set of characters

so here

i am saying

substitute

and then if i look at the values it has

basically replaced rob with cena

and let's look at the length of it or

number of characters in this so these

are some basic operations what you're

doing on matrices on your data frames on

your list

and also on your variables where either

you are assigning them values of a

particular type or you are changing the

data types you can also go for coercion

in case of vectors we have seen that

where if you are passing in values of

different types that's coerced into same

types

so later we can learn more on functions

and flow control and how that is handled

in r

let's learn how r can be used to take

care of flow control that is if i would

want to have a if else condition

and if what i would want to compute or

if i would want to check some values how

r can be used

so here

if

statement consists of a boolean

expression which is followed by one or

more statements so we can just say if we

can pass in a boolean expression where

we would want to compare particular

value or we would want to check a

particular value and then whatever is

passed in the statement will get

executed so what we can do is here we

can use assignment operator i can pass a

value to x now we can always do a type

of

and that can tell me that x

is basically an integer and now i can

use my if where i can say

is

dot and then i can choose integer

and i would want to check the value of x

if that is an integer

then i will just use

brackets and i'll pass a statement here

so let me say print

and let's say x is an integer

and we can execute this and this tells

me that the boolean value is true now if

for example we would have done something

else or

say for example

instead of integer if i had used let's

say character for that matter

and we can check the value and we can do

this

so

here we will check the values and it

says

there is an error with the bracket and

let's check this one so if x because we

missed a bracket here

so let's do that one

and then try this and it doesn't show me

any result so how could we handle

something like this if

the boolean expression does not match to

true and in that case we can always go

for else statement so we can check for a

value so if the boolean expression is

true statement will be executed and if

it is false then next statement will be

executed so we could have done the same

thing here where i said print x is an

integer which we know is not true and

what i could do is i can here after this

one

say else

and then i can open up one more bracket

and then i can say print

and i will say x

is not

a character

and now we know that x is not a

character so this is a simple way where

you can use if else and you can control

the flow

by passing in the conditions now that's

when you are using if else statements

now what about while loop so that also

can be useful when you are programming

in r

so

an else statement is executed when the

condition in the if statement results to

false so that basically means what we

can do here is

let's pass in a word or a set of words

like this for example let's say v

and then we use c function to create a

vector for example and then i can just

say hello

world

and if you look at v

you can look at the class of v

it's of characters and if you look at

type

of v

it is

having the objects or elements as

character now what we can do is we can

basically then say

count

and let's assign this a value to

now what we would want to check is is

the count of elements in

our v equals to two so what i can do is

while

my count is less than

say five

now i'm saying

i would want to

do something while the count is less

than 5.

so we have already given a value to

count as 2

and now what i can do is here i can open

up a bracket i can say print and then

pass the value of v and then what we do

is not only this we will also increment

the value of count and we will say count

plus one

and

here it gives me error probably because

we have missed a bracket so let's see

what we are missing out here so

let's just check this one again

so here it is

we have created v

which has two elements

of the type character

and then what we do is we

assign count a value of two and we would

want to check while the count value is

less than 5 we would want to print the

value of v so what we are doing here is

we are saying while then you pass in an

expression which will check the value of

count we do a print and then we

increment the value of count now this is

a simple example where you are using

while

to basically test an expression and

while that expression is true

you would be doing something whatever is

passed within your

brackets

now we could also be going for for loop

now for loop is basically used to

iterate over a list of elements or a

range of numbers

so for example if i have a vector like

fruit which has some values i could just

say for i in fruit i would want to print

something so let's try this also as an

example to test our for loop now we can

just say names

and we can basically

then assign values to this so let's say

vj

aj

dj

and let's say sj

and let's create this let's look at the

value of names now what i can do is i

can use a for loop and i can say for i

in my names so i will say for i in

names

now what do you want to do so open up

your brackets here

and then we would want to say print i

and then basically close the bracket so

you see for every element in this vector

it is basically going to print the name

one by one so you are iterating through

a set of objects

by using a for loop now this is how we

can work on

for loop

so if else while and for loop can be

very useful when you would want to

iterate or when you would want to check

the value of an expression or

when you would want to loop and do a

particular task

it's always good to

understand how you manage flow control

in r that is either when you're working

with your for loops your while loops

also understanding how you can use your

logical operators for working with your

data in r

so let's look at some examples and

understand logical operations

so either you could be having and or you

could be doing a or where you are

evaluating one condition or you are

using not so these are your logical

operations now here i can assign a value

to x

and then i can check if my x value is

less than 10 and it shows me false

so

i have been

checking the value of x so let's see is

it greater than 10 and that's true

now i can use logical operations here so

i can say

and so i'm saying is my x value less

than 20

and is my x value greater than 10 now

both these conditions are not true so in

this case we get the result as false

but if i say x is greater than 20 which

is true and

i am saying x is greater than 5 that's

also true and

x is equal to 25 now whenever we are

talking about and we have to look at all

the conditions have to

be right so let's look at this and we

get the value as true but if i say x is

greater than 10 or x is later than 5

then one of the condition has to be true

which is true in our case so we get the

result as true

we can take a different example we can

say is x less than 20 which is not true

but is x equals to 30 and that's also

not true so in this case we get result

as false

now we can straight away compare some

numbers and we can say is 12 equals 3

and that's false

and if i say not then that basically

will give me the result as true

so these are some simple logical

operations which help you when you're

working with your data in r

now we can create a data frame by using

an inbuilt data set empty cars

and let's look at our data frame so that

shows us the values with all the

different car models

and the different column names so car

models are the row names and then you

have other things like mileage and

cylinder and so on which are the

specification for the data now what i

can do is i can filter out values here

using indexing so i can say data frame

now in that data frame

i would want to compare the value of

mileage which is greater than or equal

to 30 and

then i can end it with comma so that

gives me the value wherever the mileage

is greater than 30.

i can also do a subset on data frame

where i can select a particular value

so

we can be doing this

or we can be using

square brackets we can also do a dollar

and compare the values now we will use

our logical operations knowledge here so

we will work on data frame where i am

interested in the mileage which is

greater than 20 and

i am looking at the column hp horsepower

and that should be greater than 100

remember when we are doing a and both

the conditions have to be

met as true and that shows me the result

where you are looking at the mileage and

you are looking at the horsepower column

both of these are

met and that's why we get the result

so these are some simple examples of

using your logical operations either

when you're working on a data frame so

same thing can be done on a matrix same

thing can be done on a list or a vector

or individual values

now let's also learn about flow control

that is how if else or else if is

handled in r so you can do a single

condition check so for example i assign

a value to hot which is false

and i'm saying temperature is 50. now

what i would want to check is if

the temperature value is greater than 60

which in our case will not be true

which will not be true because

temperature has been assigned 50 so is

it greater than 60 no

so

if i do this

if condition

and i am saying if the condition is true

then i would want to assign the value of

hot to true

and now if you look at the value of hot

it is still false why because the

condition which we passed for our if

is not true

it has not been met

so whatever was passed within the

statement has not been done

now let's change the value of

temperature as 100 and now if we do the

same thing we say

is my temperature greater than 60 which

is right so then whatever has passed in

the bracket will be applied so hot will

be assigned new value and now if you see

the hot value is

set to true so this is a simple single

condition check what you are doing now

certain times there can be multiple

conditions to check and that's where we

use else

so in this case we go for assigning a

value to score which is 63 so let's do

that

and now

let's say is my score value greater than

80 which is not true so whatever is

passed in here

which is print it's a good score will

not be done

but it will jump to else and then

whatever we have passed in else will be

done so it will say it's not a good

score so let's do this if

and it says it's not a good score so

this is a simple way of using if else

where you are checking two conditions or

you are checking the condition but what

if the condition is not met

then your control is passed to your next

statement

now i can also do an else if so i can

say score is 63 and i can say is my

score greater than 80 that's my first

condition so it would pretend good score

but might be i would want to check

something else so i'll say else if

and i'll say is my score greater than 60

yeah

and is it less than 80 remember the and

which has to

evaluate

and true for both the conditions so i'll

say print decent score

i can still keep on giving conditions

here in else if scored less than 60 and

score is greater than 33 that would not

meet so that will be ignored

and then you have else which says print

poor so

first it checks or evaluates for the

condition which you have passed for if

if that doesn't work then it goes to

else if and if anything in else if is

met then it's going to take that

into consideration and it will not go

for else if if and else if conditions

are not met then it goes to else

and we see decent score already printed

here now that's a simple example of if

else

and if else if

wherein we are evaluating a condition

but probably we have multiple other

things to check

now how do you work with while loops in

r that's very simple so what we can do

is we can assign a value to x

and now i will say while

my x is less than 10. so i'm going to

create a loop so i have said my x has

been assigned a value of 0 and that's

fine so this is going to be less than 10

but

if we are going to just do this then it

will keep running and it will get into

an infinite loop so we'll see how we do

that so we'll say while x is less than

10

i would want to

basically have the value of x i would

want to print x is still less than 10

adding 1 to x and what we are doing is

we are incrementing the value of x now

if you do not do this step

then it will get into an infinite loop

because x will be always less than 10

so we are incrementing the value of x by

one

and then we are giving a condition so if

at any point of time

x is equals to 10 i would want to say x

is equal to 10 terminating the loop

and then basically my while loop ends so

we can do this

so let's say x is 0 and then do this

while loop and now you see

it is at every step it is basically

printing out the value of x it is still

less than 10 adding 1 to x

and it also gives you

the value of x

so when we do a x is currently

and i print out the value of x so it

shows me 0

next time you increment it it becomes 1

and 2 and so on so this is where you are

using a while loop where you are looping

where based on a particular condition

and then you basically have

once the condition is met you are able

to

complete the loop

now let's look at

let me take this one here we'll look

into functions in a later stage

so let me take this function

and let's get rid of this one

i would also want to talk on break

statements and while loop and once we

are done with the flow control on while

loops then we can look at the functions

aspect

either we can look at how we control our

functions or how we create built-in

functions so let's look at this one

and let's continue with our while loop

so

we just saw a simple while loop here

and what we also want to see is when you

are working with your while loop

how do you break if a particular

condition is met

so we saw a simple example of

while loop

and that's fine

wherein we were printing out something

we were auto incrementing the value of x

we were also checking at one point of

time within our while loop

if the value of x was met

we would say we are terminating the loop

and it comes out of that

now if that does not happen then we

continue doing it

how about a break statement so break

statement is when you would want to end

the while loop

if a particular condition is met so for

example here i assign a value to x which

is 0 now i want to evaluate this lesser

than 5 so that means i will be auto

incrementing the value of x so i'll

create my while loop will give in a

condition that x is less than 5 now what

i want to do is i want to use the cat

function which will print the value so i

am saying x is currently

and i am printing out the value of x

then i say print x is less than 5

because we have not yet incremented the

value of x we are adding 1 to x like

what we saw in previous example

i am saying x is

then

incremented by 1

and here i'm saying if x reaches 5 so

while we keep incrementing the values

within the file loop we'll see if x's

value is 5 we will print it is equal to

5 and we can just do a break

now

if you do not use a break

you can still end the while loop but

break is basically to end this loop here

based on condition which is met and we

can do this and then run this while loop

so you see here x was met as 5 and we

just broke out of the loop

so that's your simple while loops what

we are seeing

similarly we can work on for loops

so for loops can also be useful

so your conditionals what we saw as if

else or else if your while loop is while

a condition is

not yet met you keep looping

and keep doing some actions now what you

can do is you can also work on for loops

so here i'm creating a vector

and then

i am going to loop

that is i'm going to iterate through

every element so i'll say

4. and when you're using for loops

you'll say 4

and then you can given anything you can

given any value i can say i i can say x

so i'm just giving temporary variable in

vector and then i'm printing it out so

this basically prints all the values one

by one so there is one more way to do it

you can say for

and you can say

i in

and i would want to take

length of the vector so 1 to the length

of vector that is till the last element

is reached i would want to print

the vector elements using the value of i

so what is i here it's the index

position and i can do it in this way

so if you are looping over a list

so i'm creating a list and it's very

simple so you can just do a for loop

where you can say for i in list i want

to print the i and that gives me the

list elements or you say for i in and

you give from starting position that is

1

till the length of list and you would

want to print every element so here we

can also use double brackets

so

if you would want to loop through a

matrix so sometimes that might be

required so let's create a matrix which

has 1 to 25 values around by row and you

look at your matrix and now what you

want to do is you want to iterate

through a matrix

so you want to do a looping so i'll say

for i in matrix i would want

to print out the values and that prints

out

all the values in matrix

now

what if i want to print the square and

square roots of numbers between 1 to 25

so

i can say for i

wherein the value starts with 1 ends

with 25

and then within my for loop

i can basically

give this condition where i am saying

get me the square root that is i into i

or get me a square root of i and

just

print it out so i am saying message i is

this one square root is this and

my square is this and square root is

this so if i look at this values

here

now i am looking at all the values from

1 is to 25 i am looking at the square of

the values and i am looking at the

square root so what we did was we did a

4

we passed in the elements by saying i in

1 to 25 and within the bracket i have

said what do i want to do for every

element so

either i have calculated a square i have

calculated a square root and then i am

printing out when i am using the message

function which takes the value which you

are passing in

comma the value of i

similarly square and similarly square

root so these are some simple examples

of understanding flow control in r that

is using your for loops your while loops

and also your if else

later we will spend time in learning

about functions

which could be either created by the

user or built-in functions and also

factors in r

welcome to this section of our

programming where we will learn about

functions whether that is about inbuilt

function or creating your own functions

and working on

your different data structures

so what are functions

function

is basically a set of statements to

perform a specific task

now

r

has a large number of inbuilt functions

or you can say packages which you can

import and start using

or users can create their own functions

so when it comes to functions the syntax

is very simple

you give a function name

you can assign

your function to a variable and a

function can take no arguments one

argument

or any number of arguments so let's see

some example on functions so for example

here we are creating a variable called

squares and we are assigning a function

to it now this function would take one

argument which is a

and then we use a for loop so we say for

i in

from 1 to the number a

we would basically be doing a

exponential computation

so what we would do is we would

square the value in this particular

range and assign that to b and print it

now when we do this

we can

call in this function and pass in a

value to look at the

square

of that particular value now this is a

simple example of function so this is

how it would look depending on what

value you have passed to the function so

for example we say squares and we pass

in a value of 4 so that becomes for i in

1 to 4

so you would start with 1

the value of

1 square would be 1

and then you have

your value for

2 so 2 square would be 4

then we have

3 3 square would be 9 and then we have 4

square which is 16. this is a simple

example of function and this is how you

can create your own function to

calculate or carry out some computations

now let's look at some other examples

before we get into built-in functions

which basically allows you to work with

different data structures

so there are different mathematical

functions which can be used

for your data science or

computations

you have your regular expressions which

can be used for pattern matching

or you can also use functions for data

manipulation now before we get into data

manipulation

let's look at how you work with

functions taking some examples

so let me bring up my r studio

wherein we will try out some examples

and see how functions work

now here are some examples and we can

see how this work let me just clean up

the console and we can start here

now here we are creating a symbol

function which does not take any

argument

we call it as hello world and this will

start with the word function

and parenthesis now that could have

arguments passed in however this

function we are not passing in any

argument

and what we are doing here is we are

printing out whatever value is passed

within the bracket

so let me just do a ctrl enter my

function is created and you can straight

away call this

by

just doing this

now however if you would have tried this

function without the bracket

for example something like this then it

would have printed out the complete

function it would have printed out the

complete function

and whatever you passed in to hello

world but if you would want to call the

function then basically you would just

do hello world and then use the brackets

so that's how you call the function and

that's how it shows the result

now your function can be with a single

argument so for example here we are

passing in an argument called name

and we can then use this to pass a value

to this so here i'm saying hello name

i have my function but this one takes a

single argument and we are going to use

paste which basically can concatenate or

just adds up whatever you are passing in

to paste so we will say paste hello and

then the name

notice that i have given a space here

after hello so that i can have it in the

right format and i can just do this so

the function is created and now let's

pass a name here and just try to call

the function so name is one argument or

a single argument which is passed to

this function

so let's look at the result and that

shows me the name whichever was passed

to this one

now what we can also have is function

created which takes two arguments and

this is a simple example so here we are

creating a function add num

i'm saying function it takes to argument

i'm not providing any value or default

values for this we'll see some other

examples for those

now here this particular function takes

two arguments

and whatever you pass in here

a

addition of that will be seen

so let's create this function and let's

call it and test it and that shows me

the result as 70.

now what we can also do is we can add a

vector to a number so vector is list of

elements or list of objects you can say

and here we would want to

perform add num

or we would have to call add num

function by passing in vector which

becomes the first argument and the

second value is the next argument so

let's run this one and that shows me the

result wherein 5

as a value has been added to every

element of this particular vector

now when it comes to function you can

also have default argument values which

can be passed so here let's look at as

an example so we have hello name again

but this time instead of passing in just

an argument we will also provide it a

value or you could say that could be

considered as a default value now

when we create this function we are

doing the same thing as previous

examples but we are passing in an

argument and that argument has a value

now once i do this

i can surely call this function without

passing in a value and that shows me the

name which we had assigned to the

argument

or we can even pass in a new name which

will be assigned to name so if we do

this it works in both the ways fine so

this is in one way you are passing in a

default value

and then basically

you can

either call the argument

or

you can assign it a new value

so if we would do something like this

hello name

and then for example i would say name

equals

say

jerry

and if i would do this so that also

works fine however since we are passing

in an argument we are assigning a value

so either we can let it go for the

default or we can just pass in the value

or we can be very specific in mentioning

the argument and then the value for it

now how do we return value from a

function let's look at this so here

we are creating a function we are

calling it full name

and this one takes name

wherein we are giving sachin and title

is say tendril

and what we would do is we would use a

return statement here so return would

basically

use the paste function it will take the

values of name and title and then glue

them together however we are using also

a space

so that there is a space between these

two

values to the arguments which are passed

now if i run this argument sorry this

function

my function is created now we have

already passed named arguments or we

have already passed value to those so we

can straight away say just call the

function and that does

whatever you have mentioned in the

function body

i could have also said that i could

create a new one

wherein i will pass new set of values

which we saw in a previous example

and then if we call this it takes up the

new values so

either you can let it go for default

like what we did here

we can also pass in new values

or if you would want to keep it specific

you could basically say

full

underscore name

that's my function

i could say name equals and i can say

john

and then i will say title

smith

and that's also fine

so we can do this and that works in the

same way as it would have worked with

just passing in the names

so this is fine and if you would want to

test it out say for example if i would

just take off name here and just do this

that also works perfectly fine wherein

we are

still using these arguments in the

particular order now if i would have

changed this one to name

wherein i am already passing in a value

for name and if i tried this

so in this case

what happens is name

is smith

and

basically your title becomes john

right so we have to remember how we are

what arguments we are passing and if we

are basically

assigning values to the arguments or

letting it pick up the default ones

so let's do this and that looks okay now

when you talk about scope of a variable

okay now before we understand scope of a

variable let me show you some more

examples on function now say for example

if you were using built-in functions we

have lots and lots of

built-in functions which are available

for programmers

which

they can use in their data science

activities or data processing or

computation now here we are using a

function called r norm

to generate thousand random values from

a normal distribution of mean zero and

one

so i would use the r norm

that's an inbuilt function

and i will call this say normal

distribution

so that is already done now we can find

out the mean

on these random values which would have

been generated using the inbuilt mean

function

and that works perfectly fine you can

also create a histogram out of this and

if i do this it shows me the histogram

so let's see

the histogram here

let's bring it out and that shows me the

histogram of normal distribution if you

would be interested in knowing about a

particular inbuilt function you can just

do a question mark and use the function

and that basically shows you the

documentation of the function so this is

a generic function

which computes a histogram of a given

data value

and here it takes arguments so this is

basically your data this could be the

number of arguments which you are

passing in

for

your histogram to be created

now we can look at some more examples

here so i can say two histogram with

large number of interval breaks and this

is where i am also specifying breaks and

passing in a value so this

allows me to provide arguments to

functions by position

now the same example which we have given

here we can do it without breaks

argument but as a good practice we

should

actually give name to the arguments

which we are defining so if i would do

this

when i'm passing in my data that is

normal distribution and then for breaks

i'm just giving the value 50 and that is

also fine it works perfectly fine here

now

we can create our own function

which as we saw in some basic examples

functions which can be without arguments

say this is a simple example or with

arguments

so this one we have already seen how you

can create a function without giving any

argument or by giving an argument and

then basically calling in the function

now when it comes to optional arguments

so we can look at this function

wherein i would want to say find out the

exponential value of a particular number

so i call it expo value i use my

function i say this will take the value

x now that's an argument which we are

passing in we could have given it a

value or we will just let the user

provide the value when this function is

being called i will also give a default

argument which is power equals to

and here we would want to

get a histogram of the values

with a particular power so if i create

this particular function

that's done and now i will just pass in

my value i don't need to mention power

that has been given a default value yeah

if we would want to change it then we

can pass in that so let's run this one

and that gives me

exponential value a histogram based on

the normal distribution data and by

default

it is using power as 2.

now what we could have also done is we

could have specifically mentioned a

different value for power

and that works perfectly fine i could

have just passed in the value as power

and that also works fine

so here

you are using named arguments and

basically

passing in any other arguments

now what we can also do is we can use

these named arguments and then we can

also do

or we can

pass these arguments

that is what we call as passing any

other arguments now if you look at the

explanation of this

hist function histogram function

if you look at this

it shows me these three dots and this is

what we can use to pass in

any other arguments so let's look at an

example for this one

so say for example

i would want

to

create a function where i am passing

named arguments

i am passing in the data but then i

would also want to pass any other

arguments which can be passed

dynamically now for that we can create a

function here

wherein i am calling it expo value again

i am passing in my x which will be the

data which we will pass in

you are mentioning power which is two

which is a named argument which can also

be considered as a default or you can

change the value

or you can provide a new value and then

i am also giving these three dots which

are also passed in within this

particular function

so let me create this function here

now once that is done

then i can call this function by passing

in my data which is normal distribution

power is 2 and then i'm also using these

breaks for getting my histogram with

intervals

of 50 so let's call this function

and that gives me the histogram now what

we can also do is sometimes it might be

useful to pass logical arguments so for

logical arguments what we can do is we

can create a function which will take

the data

here i am using a named argument exp

that is for exponential i am saying if

the value of

histogram is false

and then i am also giving any other

arguments so what we will do here is in

this function we will say if

the value of hist is true then this

block of code will get executed where

you will get a histogram

based on the exponential

which has been assigned in the function

passed as an argument

and if that doesn't hold true which is

by default false as we have given in our

function then this piece of code will

get executed so let's create this

function

and that's done and now we can straight

away just pass in our data

exponential value is given as 2

histogram has been given in false that

means the else part of the code will get

executed

and we can look at the values here i can

also say

histogram

is true

and that's where we will be calling in

the hist function and i can do this that

shows me the histogram so in this way we

can pass in named arguments we can pass

logicals

and then we can also pass any other

arguments for our use case

now looking further in functions let's

also understand the scope of a variable

in a function

so here i am saying v

and then i'm saying

i am global variable let's create this

and then i am saying stuff so i am

global stuff so this is basically we

have assigned some values to variables v

and stuff now let's create a function

where i'll say fun

i'll use the function and i will say

this will take my variable stuff

i'm saying print v

and then for stuff i'm assigning in a

new value and then i'll print stuff

so let me create this function and let's

see how it works so if for example i

would just say print v

that shows me

the global variable which we had created

earlier and since i'm using that within

my function

it basically has the value now i also

have a global stuff so i'm saying print

stuff

and that shows me whatever was assigned

to the variable and now we will

basically call the function

by giving in

the argument as stuff

the variable which we had created

now

if we do this then it says

reassigning stuff inside the function

and that's because within the function

we are basically assigning a new value

to this stuff

now i can also just do a print stuff now

if and if you see it still goes back and

prints the

global variable so only within the

function

reassignment happened and that's what we

understand when we talk about global

variable or local variable now

to create a function to find the final

output amount to be paid by a customer

after adding 20

tax to the purchased amount how do we do

that

so

i'm here creating a function

which will take x as hundred

and what does that function do

we would want to basically

find out

the amount which is paid by customer

after adding 20 percent of tax now how

do we calculate that so we take x

plus

20 percent of x and that would be the

final amount which will be paid so we do

a return t

and this is my function so let's create

this function and then let's pass in a

value to see what is the

amount which customer would pay

with an addition of 20 percent tax so

this is a simple function where we are

passing in one argument we are giving it

a value and then we are doing

computation within the function body

what we can also do is we can create a

function where i am passing in an

argument

and i can then

check the value of that so if the

argument

passed was greater than zero then we

would find out the final amount which is

amount plus 20 percent of the amount

if the amount is less than or equal to

amount

then

equal to zero then our final amount is

equal to amount

and we return f amount so here we will

be evaluating these conditions and based

on that my function will return the

value so let's create this function and

pass in a value

and that shows me hundred so you can

just test this by saying amount one

and say for example i would have passed

in zero now in this case my final amount

is zero because there is no amount which

needs to be paid by the customer

now

checking the argument and the body of a

function so

we can always use this inbuilt function

args which will tell me

for this particular function what are

the arguments and what is the body of

the argument which basically tells me

whatever we have coded within the

function body

now to understand the scope we can

create a function here which is taking

an argument x and what does this do so

we assign a value to y

then

we basically say g one and here

i am using function of x now what does

that function of x do

so this one will take the value of y

plus

multiply x by itself

so this is

a function which we are creating

and then i am saying g1 of x

so what you are doing is

whatever value was passed in as x

for that function x will be applied so

let's create this function and then pass

in a value 10

and that gives me the result as 110.

similarly we can create another function

where we want to do some computation

and then i am creating one more variable

which has basically the function

pass in a value for y and then basically

what you do is

you are calling in your g2 function

and then

let's call in this function

so let's do this

and

let's also create f2

and then finally we will call in f2

which is internally calling g2

so these are some simple examples where

you are doing some computations

and creating some simple functions let's

also create a function

which is taking two arguments so here i

have g2

function takes two arguments x and y

what does that function do here we are

saying y plus x into x that's my g2

and similarly i'll create f2 which is

going to have a value assigned to y

and this one

is going to call in my g2 function

which will take

x and y

x

which we are passing in here

and y which we have assigned

so let's create this

and then let's call our f2

and what does that f2 do it basically

has the value of y assigned and then it

does

whatever is mentioned in g2

with our x and y values so i am passing

in 10 here

so it is

basically

y which is 10 plus

you have

the x value which has been passed here

so let's look at the calculation which

is 10

so that gives me 110 so 10 into

10 into 10 plus the value of y

so this is how we can create functions

which have been assigned some values and

then pass in some other values to those

look at some more examples here when it

when we work with functions

and see how we can use functions to

carry out our basic operations or

calculations so for example here

i am creating a function

and this will take an argument wherein

we are saying it would be marks

now let's do this and the function body

would say result is not defined

now if the marks are greater than 50

then result will be

pass

and you will have the message which is

your result is

and then you are passing the value of

result

so let's look at this one so

let's create this function pretty simple

function

and then

let's pass in a value here so i'll say

status as 60

which will be checked for the value

greater than marks or lesser than marks

and that tells me your result is pass

and if we give this one then it says

your result is not defined however we

can

have additional statements here which

can say if the result was

lesser than 50 then what should have

been printed this is a simple example

let's look at one more example and here

my argument

is h now just notice that we are not

passing any default values or we are not

passing any values to the arguments we

are just passing in an argument

which will be assigned a value when you

call the function now here we say age

group is not defined we say vote is not

defined and then we start using some

condition checks

so i say if the age passed is greater

than 18 then

the age group would be adult and the

person can vote

and

message your age group is and voting

status is will be printed out

so

we can use this or from our previous

learning we can do a if else and modify

the function

so let's create this function and then

pass in a simple value to this and that

tells me what is your age group and what

is your status to vote

so

now if we would want to create a

function to convert a name into

uppercase

let's see how we can do that so we are

creating a function here which takes the

value name

now then we also find out the length of

this particular argument and for that we

are using a inbuilt function called n

character

which will be

for your name

and you would want to find out the

length of this particular name and we

would say if the length is greater than

5

then we are again using a inbuilt

function called two upper which will

convert the argument or the name passed

to uppercase

we will say message

user given name is and then you print

out your name

so let's

call in this function

so let me first create this

and then i can call in this function and

we clearly see that the number of

characters in this word is more than 5

and that's why it is converted to upper

case however if you would call the

function with a name which has less than

5 characters

it says

as it is

now this is again a simple function

which we created let's see how you can

create a function to calculate bonus

now here we are passing in two arguments

so this function takes two arguments one

is salary

and one is experience

and then we say if the experience is

greater than 5

then bonus percentage will be 10

and else bonus percentage will be 5

and here we will calculate the bonus so

first it will find out

how many years of experience a

particular employee has and based on

that a value of bonus will be assigned

or bonus person page will be assigned

and then you say what will be the bonus

that is salary into the bonus percentage

and return the bonus amount so this is a

simple function let's basically

select this

and let's create this function and then

let's calculate the function if the

salary is 25 000

and experience is 6 years

and that basically will tell me the

value so let's look at the value it

tells me

2500 which is 10 percent of the salary

similarly if we go for this one which

will

basically go for the execution of else

part of the code we can do this and that

gives me bonuses half of it

now

how do we handle multiple conditions and

multiple actions so let's look at that

so let's create a function which takes

one argument which is h

we would check if the age is greater

than zero then

we would want a nested if within this

condition

so if age is greater than 0 then

whatever we have given here will get

executed

and

this will be

this part of your code

and here

i am again checking if age is less than

18 then

age group would be kids

else

if

now else if is to check the second

condition so if the age was passed if it

is greater than 0 then we get into this

block of the code now it was greater

than 0 but then

is it less than 18 then i would

categorize the person as kids if age is

less than 60

then we will say

age group adult

else we will say age group senior

now we can

basically say that we could have given

more conditions to this because here we

are saying if age group is less than 18

then

the individual would be within the age

group of kids

if that is not true that is is not less

than 18 so probably it is 18 or

greater than 18 then we are checking the

second condition if the age is less than

60 age group is adult and if

these two conditions are not met then it

jumps to else where age group is senior

and if this whole block

was ignored because age was less than 0

then

we would have just printed out age group

is not defined matches messages wrong h

and your age is such and such so this is

our whole function so let's go ahead and

run this

now let's check the age group when the

age is 10

when the age is 40 when the age is 65 or

when the age is minus 10 which is not

defined

now there are some inbuilt functions

which can be used in r

such as your switch function so looking

at this

function that is switch function

we can see

or we can use this for our different

kind of operations so here your switch

function returns values

match with the first argument and first

argument should be a character let's

have a look at the example

so say for example you want to return

the

house rent allowance or hra amount based

on cities

so we create a function called hra now

that takes an argument which is city

name and here we will say what does this

function do

so here i am saying hra amount and i am

going to use the switch function

now switch function i am saying i would

want to convert the city name to

uppercase so that we can maintain some

consistency and here i am saying if

the city is bangalore it would be 7500

if it is mumbai

thousand if it is delhi eight thousand

chennai seven thousand five hundred

and you have five thousand value and you

are returning the hra amount now what do

we do with that

so let's

create this function

it's done and now we will pass in the

value

so we will see

whatever value has been passed to this

and that gives me the value here right

so switch is basically taking me

directly to this value now however if i

try to provide a city name which is not

given in the list

so when i'm saying say for example

pune

now what is happening is it is just

taking a value which has not been

assigned to any of these conditions

if i go for

again something else which is

in a lower case

now this is where your two upper

function will come into use and if we do

this it basically converts this into

uppercase mangalore and then basically

it gives you the value

so this is the usage of a switch

function

let's look at one more example so for

example here we are creating a salary

range which will take an argument which

will be banned and i will say these are

my bands or you can say these are my

options so i can say l1 is basically ten

thousand to fifteen thousand

l two

is so and so

and l three is so and so and you return

the range

now

let's create this function

sometimes you have to

do it this way

so our function is created and now we

can just do a salary range

given a value and that gives me the

range of the values however if you pass

something

which is not mentioned then it basically

prints out null

so in r you can also use repeat which

can be useful and

what does repeat do so here i am

assigning a value to a variable called

time

let's do that

and then i'm giving a piece of code with

repeat now what does repeat do so you

are passing in a message which is

hello welcome to our tutorial

and then you are saying if time is

greater than or equal to 20 you would

want to break out from this loop

and

then you also increment the times value

and this will keep repeating till

this if condition is met

wherein we have said

time value starts from 15 so let's do

this

and this basically will print out

the message wherein first my time was 15

which was

less than 20

so you increment it it becomes 16 you

print it again 17 print it again 18

print it again 19 and 20 and as soon as

you reach the times value which is 20 it

breaks out of this and it stops printing

this particular message

okay now let's look at

some more examples so if you say r

we will use say a function to find the

square of any given user number okay if

the square value is less than 100 then

increment user value by 1 and find

square again and repeat this till square

exceeds

100

pretty simple so you create a function

which takes n as an argument and you

would want to repeat it

so you would want to repeat this by

squaring the numbers until the square

exceeds 100

and once it reaches 100

you will break out

so this is what we are doing and we are

auto incrementing or incrementing the

value of n by 1 every time

we calculate a square

and then you return the value of n

so let's create this function

and now let's calculate it for square 6

and that tells me what is the square now

as soon as your square value touches 100

it basically breaks out of the loop

now if you would want to find balance in

a bank account after n years

if

a person has deposited x amount in the

beginning and bank gives a interest of

eight percent per annum right this is a

simple calculation so it needs the

amount which was deposited you need the

year and you need the

rate

now year which is n ears can be given by

the user

rate we have already given 8 percent

however functions main functionality is

that you can even assign new values to

it say

later one month down the line the bank

rate changes might be it increases might

be decreases then

function should not be

modified it can just take up the new

values and start calculating from there

on

now here

we will say

get the final balance

function takes amount

the amount which would be deposited year

and that could be say four years or five

years or ten years for which you would

want to calculate the rate of interest

and add it to the amount

so i will say for i in one to year so

that depends on how many times you would

want to

run this loop

i would say interest would be

using the round function i am saying

amount into rate

whatever is the rate of interest

and then you are giving two years

now

final amount

will be calculated

so you are basically saying amount plus

interest and you will pass in

a message where we'll say year

is the value of i that's first year or

second year

amount what is the amount what is the

interest you are calculating based on

the round function

and final amount will be amount plus the

interest

and then

you basically say

amount will be given

our final amount will be assigned to

amount now if this is a function you

would want to return the final amount so

let's select this

and then basically create a function

and let's say

i would want

the final balance if the amount

deposited was five thousand

it was kept in the bank for five years

and rate of interest was eight

now

that should basically give me

my final amount

and if we

double that

so we say amount is 10 000 number of

years is 10 but the rate of interest is

less so let's calculate this and that

gives me the interest however if you

notice based on my message it is

basically telling me what was the first

amount what was the interest what was

the final amount and it does that

for

all these number of years

so these are some simple examples for

your functions right now we can also

look at on the similar lines we can

create some interesting functions so you

can find the total number of years

required to raise

thousand dollars if the user deposits

750 per month

so here you're not actually calculating

the final amount but

you would want

to find out how many years

are required to basically have the

amount as 8000 so

your function we are saying the amount

is say

550 or say 750

per month

now i would say let the final amount be

zero as of now month is zero and i will

say

while my final amount is less than or

equal to eight thousand i would want to

do something and that is you are

incrementing the value of month by 1

because that's your first time

your amount is less than 8 000 whatever

deposit was made say 750 per month and

then you have final amount which is

your

initial amount which has been assigned

to f amount that is zero plus the amount

you print out the message

and then you basically say year is

whatever value was passed for month

so you may want to have it for number of

years or years with particular amount of

month so we will calculate the year

value now here what we are doing is

we are calling in this required years

function

without an argument which takes the

default argument

or you can pass it with 750.

we can run this so let's create this

function pretty simple

done

and if we do not pass an argument then

the amount is 750

and it tells me what would be or how

much time it would take

for us to reach from say 750 or 550 to

final amount

similarly if i would have done this

it tells me

again a new value so we are finding out

the total number of years required to

raise

1000 or raise the amount to 1800 dollars

so these are some simple examples of

functions which you can use for your

operations your calculations and also

creating functions which can be

repeatedly used with

either one either no or either multiple

arguments

now

so far we were learning on creating our

own functions

and we also looked at using some inbuilt

functions

either

creating a plot or basically doing some

basic operations

or

passing in multiple arguments

so let's look at some more examples and

when we talk about

built-in functions there are lots and

lots of built-in functions which are

available in r

which can be used so let's look at these

so for example here are some built-in

functions

which can allow you to work with

different data structures for example

you have a sequence function

which allows you to create sequences so

for example i could just say test nums

and i can just say sequence

and here i can say where does it start

from so might be i can say 0

goes all the way to 50 and then i can

also say

if i would want a jump or how many

numbers should be used so for example

let's do this

and now if i look at test nums

so that shows me the value however not

to confuse we could have also done this

using assignment operator like this

and then look at your test nums so it

tells me

it has created a list of numbers from 0

to 50 which are even numbers now you can

always do a class off

and let's look at this

and that tells me

the objects here are numeric

and say for example

i would use type off

to see what is this says nums which we

just created there was a typing mistake

let's check this and it has the values

with w

right so we have created a sequence here

where we are creating a list of numbers

which have

a space of 2 or you are saying about

even numbers now you can also use a sort

function so i can do a sorting here and

i can give it an increasing or a

decreasing order so if for example i

have created this sequence and i could

just create a simple variable like this

pass in a vector into this

which could be say for example i'll try

your test nums

and then

look at your v

so those are my numbers and you can

straight away do a sort on your test

nums so i could just do a sort on v

and that basically shows me the number

however i could also do a sort v

and then i could say

here let's check this v comma

and then you can say decreasing

equals true

and let's do this it just reverse or

puts the data in a reverse order or it

sorts based on decreasing value and

having the greater value in the

beginning and the lowest value at the

end

so you can use the inbuilt sort function

similarly you can use a reverse now

reverse need not actually sort the

values it will just reverse the elements

in your sequence for example let's say

v2 and i will again use

this one

as c and then just passed your test nums

that's an easier way or i could have

created a new vector so i'll say test

nums

that's my v2 and you can do a reverse

on

v2

and that basically

shows me the values but here we see

let's see so we are looking at

okay so this was wrong i should have

given a capitals

and do it yeah this is fine

and we get the values however if i had

created something like this v3

and let's say

c

and then let's say 99

and two and three and four and five and

seventy eight hundred

so that's

a vector i'm creating

and now what i can do is i can use the

reverse

on

v3

and you see

it has just reversed the elements in the

list

now we could have done this without

giving these brackets here

and it shows me the result

so this is good to understand what your

sorting does so sorting is basically

going to look at the objects and it's

going to sort them in ascending or

descending order

reverse is just going to

reverse the elements in your list now

similarly you can also use append which

is basically to combine objects so let's

say v4

and that will basically have

append

and let's say let's take v2

and let's take v3 and this is what we

would want to append

and now look at your value of v4

which basically has everything added

into one so this is your append

similarly you have other functions like

finding out the absolute value

of a number you would want to find out

the square root you would want to find

the sum of all the elements in a vector

you would want to find out the floor

value exponential value

of something

and you basically finding out the mean

value so these are some built-in

mathematical functions so you have

built-in symbol functions you have

mathematical functions you have regular

expressions in r which can also be used

for pattern matching

now what we can simply do is we can

create a variable let's say text

sorry for caps let's say text and here i

will pass in something r is a

programming

language

for

data science

let's do this and now i would want to

use grep function so i can say grep

and this one needs what i am searching

for so let's say language

and where am i searching for so i'm

searching it in text

and let's do that and that tells me

where is this found so when i do a grep

i am trying to find out if this was

found in my element so here i am saying

text and grep language similarly i can

also use one more function which is

finding out

index positions so i can also find out

index positions

by basically giving the vector and here

i can do a grep pass in my vector

abcd you are searching for b and in your

vector

and that tells you your b is at the

index position 2 d is at the index

position 4. so here we are using some

regular expressions now there are also

other

ways in which r can be used for data

manipulation so let's learn about

factors in r and

how do you work with factors and what

are they for

so when you talk about

factors

so here let's clean this up

and let's see what is this so when you

say

factors here we are talking about

categorical variables

so categorical variables can take only

limited number of different values now

don't be confused with this

histogram example here might be we can

just look at packages so that that

doesn't get confused

so when we talk about categorical

variables

we are talking about

variables which can belong to only

categories for example in r there is a

data structure to work with these kind

of variables and that is called your

factor

so with factors we can be sure that all

statistical modeling techniques will

handle such data correctly

so for example you can talk about a

person's blood group and you can say

the blood group could be a or b or a b

or o

so say we collected information about

eight people

and we

recorded this information as a vector

and we can call it blood group so let's

do that so let me

try that here

so if i say

blood

group

and then

i would like to create a vector here

so that

we can look at information about eight

people and their blood group and this

can be

in the form of a vector which can then

be created or converted into factor

by using the factor function

so how do we do that let's say i have

blood group and here i will basically

given some values

so i will use c function and here let's

give some values so for example

let's say

b

let's say a b

and let's say

o

and let's say a again

let's again say o

might be one more o

let's say a

and let's say b

so here we have eight entries and let's

consider we have recorded the blood

group of eight people

and this is in the form of a vector so

for example let me create this now this

is a vector which we have created and

you can always look at the value of this

one

so let's say

blurred group

and that basically

okay there was a spelling mistake let's

do blood group

and that basically shows me the values

and here you see all the values that are

in double quotes now we have basically

created a vector

now to convert this vector into factor

we can use the factor function

and how we can do that is basically we

can say for example

[Music]

let's go here and let's say blurred

group

underscore factor

and for to convert this vector into

factor i will use the factor function

and then basically pass my

blood group here

and

now i have created a factor and we can

look at this factor by just doing in

blurred

group factor

and now if you see

it basically shows us a factor

it does not have any double quotes and

you can also see the factor levels for

categorical variables which get printed

out here

now what what actually r is doing here

is first r scans through vector to see

the different categories in there

then our sorts levels alphabetically

and then it converts

the character

vector to a vector of integer values

so these integers correspond to set of

character values

to use when factor is displayed now we

can always do a structure

to find out more details of this

and here i will

pass in blood group factor and let's

look at this one and this one shows me

this factor is with four levels

so inspecting the structure will reveal

that it has four levels

it shows me what are the categorical

variables and it shows me some integers

so here we are dealing with a factor of

four levels

now a's are recorded and a

would have

say

recorded as one so that would be your

first level

you have abs

which which are recorded here

and that is basically your second level

b is the third level and o is the fourth

level so when this

uh when we are looking at this factor we

may think why this conversion so

categories could be

very long character strings and each

time repeating a string or an

observation can take up lot of memory so

using factors and having these levels

can reduce the memory space

now factors are actually

integer vectors

and each integer corresponds to a

category or a level

now to specify a different order of

levels

we can specify levels inside the factor

function how do we do that so let's say

i will say blurred underscore factor 2

and here

i will basically

give the same so i will say factor

and this factor will have blood group

which we had created earlier but this

time also i'm going to specify

levels

and then i can basically pass in a

vector here

so within this levels i will specify the

values what are the levels

so here i will say o

then i will say a

then let's say

b

and then let's say

a b

so this is what we are doing here

to specify different order of levels and

we are specifying the levels here now if

i do this that would have got created

and let's look at

blood

factor 2 and that shows me the value

here

below where you have specifically

assigned the levels now if you look at

the previous one where we had blood

group factor where levels were

automatically

understood by r

so we were looking at the categorical

variables we were seeing what are the

levels here

and here what we have done is

we have just created again a factor but

then we have specified levels in a

different order

and you can obviously do a structure on

this to compare so for example i'll say

structure

and then i will say blood fracture 2

and that basically shows me the

structure with four levels so this was

the initial one where we said a a b b

and 0

and then there were some integer values

which were responding to these

categorical variables here we have given

a different level and we have

a different set of numbers which we see

here

so if we compare structure of blood

factor in blood factor 2 we will see

encoding is different right now that is

done we can also specify the level names

so what we can do is

as we use names function for name of

vectors we can pass vectors to

levels here

and there is

basically

a function what you can use

so let's say i will

say levels

and then within this i'll pass blood

underscore factor

or blood group underscore factor here

so

once this is done

let's say

blood group

so in this one we created blood group

underscore factor

and this one was blood factor two so

that's okay i mean it's just a naming

convention

and here let's pass in levels

to my blood factor

and then what i can do is i can pass in

the values here

so this is

when you would want to give specific

names

and let's create a vector

and let's call it say bt underscore a

and might be you would want to give bt

underscore a b

and then you will give

bt underscore b

and

the final one is bt underscore o

so what i'm doing is i'm doing the

naming for these particular categorical

variables

by using levels

now let's do this

and

it says blood factor not found

so

we have to look which one did we have so

we have

blood group factor so this is what we

should have given so let's say blood

group

factor

blood group factor

and now we have given some names here so

let's look at

the blood group factor now

blood group

underscore factor and now if you see we

have some levels or we have given the

name to the categorical variable so if

you compare this one so here we were

creating a blood group where we had

these variables

and these variables were the categorical

variables

which was just creating a vector

then we created a factor out of it

and then we looked at our factor we

looked at the structure of it

and similarly what we did was we created

a different factor so let me also change

the name here

and let me call it blood group factor 2

but here we were specifying levels in a

different order

let's look at this one so which is blood

group factor 2

and then you can look at the structure

of this one

blood group factor two and here

in this example what we did was the

initial blood group factor what we had

created we have just given

some names to that like what we would do

in case of vector by using the names

function here you are using the levels

function

so we basically

created some levels and let's group at

the blood group factor now

which basically has some

different names

so what we are doing here is we are just

using naming now we can also specify the

categorical variable names

or levels by specifying label arguments

so inside the factor function so that is

basically to give some names or levels

so let's look at this one and how do we

do that so we can specify by using

factor

which basically creates your factor

and then here i'll say blurred group

i'm going to specify labels for my

naming

so

in

previous examples we saw how we were

using the levels right and this was

by specifying levels for a different

ordering

and then

we could have also done this by saying

levels and given some different names

or we can just do labels

and then

within this i will say labels equals

and then i can say

c and then let's give these values which

we have bt underscore a bt underscore a

b bt underscore b and o

so let me just copy this one again

and let's put it here and then we can

basically do a ctrl enter

so i would have created a factor here

and then

we should remember one thing here that

it is important to follow in

the same order as the order of factor

levels that is a

a b

b or o now these are the

levels what we are seeing so if you look

at any one of these in the beginning

which we had created

it was showing me what levels it has a a

b b and o

and a a b b and o so we are following

the same order but we are using the

labels

within

my factor creation

now sometimes there might be issues

because of wrong ordering so we can

actually use a combination of manually

specifying the levels and label

arguments when creating a factor

now what we can do there is we can say

factor

and in this case let's say

blurred group

which i'm creating

then i will basically say

levels

to give

the right ordering

and here in levels

let's say

oh let's say

a

let's say b

and then let's say a b

so this is for my

levels which i'm creating

and then what i can also do is i can go

for

labels

so levels will take care of my ordering

and labels will take care of by naming

the categories so let's say labels

and then we can create a vector

and we can give some names so we can say

bt underscore o

what else we have we have bt underscore

a

we can then say bt underscore

b

and finally we can say bt

underscore

a b

and

then let's create this one

so now what we have done is we have

created a blood group which has levels

which is following your ordering which

is following the naming

as we have passed so if you look at the

levels it tells you the names what you

have created

it also tells you all the categorical

variables

which were used

for my blood group and basically these

will have some labels

so

we can anytime

look at our

blood

group

which we had created in the beginning

and let's look at the values of those

so when we talk about categorical

variables

there are

two kinds in categorical variables so

you have nominal or you have ordinal now

in nominal you don't have any implied

order for example

blood group o

is not necessarily greater than a that

is o is no or not more worth than a

that we can think of

now trying such comparisons with factors

will generate a warning so

say for example we would want to

look into our

blurred

factor and let's look at

what blood group factor contains now

that's the new blood group factor let's

say blood group

factor

and here i will try to pull out a value

here and let's compare this with

blood

factor

and let's look at some other value now

in this case

we see not meaningful for factors so it

cannot really compare the categorical

variables and see if one

variable is greater than other or has

more worth

now

there can be many examples where such

ordering does exist and in r we can

impose such ordering in factors thus

making it ordered factor so inside

factor

we can set the argument ordered is true

and we can do that now for example you

would look at

the size of address so let's say

address size

and here i will say for example let's

create a vector

and let's say medium

let's say large

let's say

small

let's say again small

and then let's say large

let's say medium

again an entry of large

and then let's say

medium

so here i'm creating a vector

and let's see if we missed out any

quotes or comma

so it says unexpected symbol

and where is that so let's look at this

one so we have dress size

and we are looking at c so i am saying m

l

s

s

l and here is a quote missing

and that was the reason so

and this one also has a quote missing

and now it should resolve yeah so let's

look at this one and now we have created

a vector called dress size now obviously

you can create a

factor of this so i'll say dress size

underscore factor

where i would want to look at the

ordering of this so let's create a

factor

and in factor we will pass our vector

on which we we want to convert or we

want to create a factor

we will say ordered

equals true

so i am specifying a particular ordering

and then i can also specify levels as we

saw earlier

so in levels we will give the category

so what categories we have

so we have small

we have medium

and we have large so these are the three

levels which we have

and let us create this as a factor now

that's done

and what you can do is you can look at

the

factor and we can also do a comparison

so let's for example look at our factor

what does it contain it has some levels

and if you closely notice

there are these levels which also have a

comparison of which one is worth or

more worth than other variable so you

can look at

dress size factor which has some

ordering which we have implemented and

now let's do a comparison between dress

size

and

compare it with some other variable and

see what is the result so now it says if

it is true or if it is false earlier we

were not able to do that because we did

not have any ordering and if we were

looking at the variables we were not

really clear

if one variable has more worth than

others so these are some simple examples

what we have seen now we can also look

at some more examples so say for example

i do a type here

now that basically is creating a vector

if i would want to compare the element

that is type 3

is it greater than type 4 it shows me

false

right now here what we are seeing is

that

if you are looking at a particular value

okay

we can basically see

that there is some comparison happening

here if i compare this with 1

and 2 which tells me true or false

and if i look at this it also does some

comparison so i can always

convert this into factor by using the

factor function so i can do this

if i'm checking

if for example i would want to

create a nominal factor i can do a type

dot factor and it tells me it is true

you can also do a type dot factor 2

and then use the factor function pass in

your type which is a vector and here you

are saying ordered as true which we just

now saw

and now you look at type dot factor 2

which is creating an ordinal

type of variables

now here we can again create type dot

factor 3 so what we are doing here

in this case

we had a vector

we basically

said

type dot factor

we said factor is of true

and then we looked at the nominal factor

we also did a factor 2

and then we created factor but we

specify ordered as true

so we get ordinal

and now if you look at type dot factor 3

here you are saying ordered and you are

also specifying levels like what we did

in the previous example and now

you would look at ordered factor with

user given order which also has the

levels which clearly show us a

comparison between those now we can take

a different example we can say type dot

factor 4

we are using the factor function i am

specifying type which is a vector i am

saying ordered is t

i am using

level which is giving me

some levels

and then we also have labels which are

basically going to have the naming

convention so let's look at this one and

look at type dot factor 4 so it tells me

what are the categorical variables

which are small medium large small large

medium

these are for my

type values which we created a vector

here these are my type values

for which we created a vector

we said ordered is true the levels is

small medium large and we gave some

names so we are looking at the values of

this so this basically helps you to work

on your categorical variables when you

can then compare the values

and you can see what does it show

now here what we are doing is we are

creating a different vector we say small

tall tallest medium small and so on

let's look at this one which is

basically

type and it has the value

so what we would want to do is we would

want to compare height type of first

value with the fourth value

so for that

let's create a vector

on this type ordered is true level is we

are saying

small medium tall and tallest

these are the levels

and now when you look at your type dot

factor phi

it basically shows me

what are the levels which you have

specified so small is the smallest then

you have medium which is bigger than

small tall is bigger than medium tallest

is bigger than tall we have assigned

some levels and based on these levels

now you can compare

your values in this factor type dot

factor 5

take the first value which is small and

compare it with the fourth value which

is medium and you will know if small is

greater than medium so the result would

be false

now i can also convert this into integer

and

i can continue working on this

now here you have

basically

a sequence

so let's use the sequence function where

i'm starting from 0 ending to 20

and there is a jump of 2 so that

basically creates a vector

let's look at the vector value here

and if you would want to sort the vector

so we are using a inbuilt function

wherein let's create this

vector with these numbers i can do a

sorting i can also do a sorting with

decreasing is true

you can do a reversing of vector so

these are some examples of inbuilt

functions which we have already

discussed so here you are doing a

reverse you are finding out the

structure you want to append two vectors

you want to check the class of an object

you want to convert a vector into a list

using as dot list

converting the vector into a matrix

you are having a sample

with with two random values between 10

and 20 so these are some inbuilt

functions which we have already

discussed such as your absolute

such as your

vector and getting an absolute value or

getting a sum of it or a mean of it

around

or basically rounding it to two decimal

places getting the ceiling value getting

the floor value truncating it

get returning the log getting the

exponential value and so on

now we have also looked at regular

expressions earlier so regular

expressions let's just revisit that

so here you are basically creating a

variable called text and then you can

just do a grip

you can say what you would want to

search and where you would want to

search it and that would give you the

logical value indicating if the pattern

was found

you can try to search something else

which might not be found you can also

search for independent values like this

and that basically can give you the

position of that particular object

within the vector

and here is one more example of working

with timestamps so for example

if i would just to assist or date it

returns the current system date

if i would want to

set that as a variable and then call

that variable it shows me

our current time

i can also use as date

and then let's look at this one so as

date and this

would be converted into date and then

you can obviously use formatting

techniques like

getting the month getting the day

getting the year

so here we are passing in the date and

then we are saying what format we would

be interested in

and that basically gives us

the data in a particular format so

that's also useful when you have your

time series data or when you would want

to convert the data types and so on now

there are different ways in which you

can do formatting so for example in this

one we were saying

month day and year

i can also say

for getting the full month name or

getting the full year name i can do this

caps

so i can look at this one and that

basically shows me

my

date in a particular format

so these are some inbuilt functions

which we are seeing and before this we

were seeing factors which is mainly to

work with categorical variables

either they have levels auto assigned

and they might not have labels

so you can give labels you can give

levels you can control the ordering you

can give levels in a different way so

that you can have a different ordering

so this is how you use factors and work

on categorical variables maybe that is

nominal or ordinal and easily you can

do your statistical computations on such

data

let's learn about data manipulation in r

and here we will learn about

d player package

and when we talk about this d player

package it is much faster and much

easier to read than base r so d player

package is used to transform and

summarize tabular data with rows and

columns you might be working on a data

frame or you might be getting in a

inbuilt r data set which can then be

converted into a data frame so we can

get this package deployer by just

calling in library function

and this can be used for grouping by

data summarizing the data adding new

variables selecting different set of

columns filtering our data sets sorting

it selecting it arranging it or even

mutating that is basically creating new

columns using functions

on existing variables so let's see how

we work with dplyer now here

i can basically get the package here so

i can just say

install dot packages d plier now we

already see the the package here which

is showing up so i will just select this

one i can do a control enter and that

will basically set up the package

package deep player successfully

unpacked

so that is done now you can start using

this package by just doing a library d

plier

and this was built it shows me my

version of r so let's also use a inbuilt

data set that is new york flights 13 so

we can do install dot packages and that

will search

and get that relevant data set i can

again call it by using library function

now once that is done we can look at

some sample data here by just doing view

flights and that shows me the data in a

neat and a tabular format which shows me

year month day

departure time schedule departure time

and so on

now we can also do a head to look at

some initial data

which can help us in understanding the

data better so what is this data about

how many columns we have what are the

data types or object types here

it shows me how many variables we have

so this is fine now we can start using

the player and

in that we can use say filter function

if we would want

to look in for specific value now here

we have the column as month so i will do

a filter now i'm creating a variable f1

i'm using the filter function

on flights which we already have

and then what we can do is we can

basically

look at the month where the month value

is 0 7

so let's look at that

and this one

you can do a view on f1 which shows me

the data wherein you have filtered out

all the data based on month being 7.

so this is a simple usage of filter we

can take some other example we may want

to include multiple columns so we can

say f2 filter

flights and here we will say month

is equal to 7 day is 3

and then look at the value of f2 if you

are interested in seeing this

and that tells you the month is 7 and

days 3 you could also look into a more

readable format by using view on f2 and

that gives me my selected result so we

are just extracting in some specific

value we can keep extending this so here

we can say flights

is what we would want to work on i'm

using the filter function so i can

straight away

instead of creating a variable then then

doing a view i can also do a view in

this way i can just pass in my filter

within the view and within this i am

saying filter i would want to look at

the flights month being 0 9 day being 2

and origin being lga

and then that shows me the value here

and obviously you can scroll and look at

all the columns and if you see the

origin column it shows the selected

value so now we have filtered out our

data based on values

in three different columns

now

what we can also do is we can use and or

we can use or operators so

i could have done this

in a a little different way so i could

have said head which shows me

initial result

i will do a flight so within my head

function i am passing in this

and what does that contain so you are

saying flights and in this flights data

set

you would want to pick up the month

being the column so we use the dollar

symbol here we given a value and i'll

say and and i'll again say flights

wherein i will select the day being two

and and and remember when you talk about

and it is going to check if all the

values are met true so then you say

flights origin

lgea and you look at the value so in

this way i can

filter out specifically multiple values

by specifying columns now we could have

done it in this way we could have

created a view or we could have assigned

this to a variable and then done a view

on that where we could have selected

month being day and origin

or you can be more

specific

in specifying all the columns it makes

the code more readable so let's look at

the values and here you are looking at

head which shows me based on month

day

and then you can look for further

columns for other variables that is

origin being lga

now what we can also do is we can do

some slicing here to select rows by

particular position so i can say slice

and i would want to look at

rows one two five and i can do this

so you can always assign or look at the

view of this

i can just do

here so when i did a slide one is to

five it shows me

my entries

for one to five

now similarly we can do is slice five to

10

and now you are looking at

5 to 10 values

so you can always look at the complete

data and then you can slice out

particular data now mutate is usually a

function which is used when you would

want to apply some variable on a

particular data set

and then you would want to

add it to

your

existing data frame or you would want to

add a new column so this is where you

use

mutate which is mainly used to add new

variables so let's see how you work on

mutate

so

it's pretty simple so you create a

variable over delay now i would want to

do a mutate so that it adds a new column

so i'm selecting my data which is flight

i will call the new column as overall

delay

and then basically

i can look at

overall delay being arrival delay minus

departure delay so let's create this and

let's look at view of this which shows

me

or which should show me my new column

which is overall delay which was not in

my original data set so you can anytime

do a head on this one to compare the

value so this one shows me arrival delay

and then there are many other variables

what you can also do is you can do a

view

and you could have just look at flights

if you would want to compare

so you can look at the flights and this

one would not have any

overall delay column so it basically

shows me 19 columns only

what we see here

and if you

do a view

on overall delay then that basically

shows me 20 columns so we know that the

new column has been added

to

this overall delay so if you would want

to work with 20 columns you will use

overall delay if you would want to work

with your original data set you will use

flights now you can also use a transmute

function which is used to show only the

new column so we can do an overall delay

and at this time we will say transmute

we will say flights overall delay

the computation remains same but at this

time if i look at view on overall delay

it only shows me the new column so

sometimes we may want to compute result

based on two variables or two columns

and just look at the new value

and then we can decide if we would want

to add it to our existing structure

now you can also use summarize

and summarize basically helps us in

getting a summary

based on certain criteria so we can

always do a

summarize

and

what we can do is we can look at our

data

and we can say on what basis we would

want to summarize this particular data

so we can do a summarize function now

summarize on flights i will say average

a time

and i would want to calculate an average

so for that i am using inbuilt function

called mean

i will do that on airtime column

so

let's look at flights once again and

here we can see there is

arrival time not a time sorry arrival

time and we would want to do some

average on this particular data we would

want to summarize this so what i'll do

is i will use the summarize function

i will say average airtime and this one

i will look at mean of a time so let's

see if there is a a time column i might

be

let's look at this one and i will delay

and yes we have an airtime so we were

actually looking at summarizing based on

airtime not the arrival time

so time is how much time it takes in air

for this particular fight and we will

want to use the trans summarize function

not the transmute so summarize flights

average a time and this one we will

calculate the mean of average a time

and

i will also do a any removal which is

i'm saying true so let's do this and

that basically shows me the average a

time is 151

i can also do a total a time where i am

doing a summation of values or i can get

the standard deviation or i can

basically get multiple values such as

mean

i can say

total airtime where i am doing a

summation

and then i can look at other values

which is if you would want to put in

standard deviation here you could do

that so let's look at the result of this

summarize and this basically allows me

to get some useful information which is

summarized based on

a particular function such as mean sum

standard deviation

or

all three of them

now

let's look at grouping by so sometimes

we may be interested in summarizing the

data by groups and that's where we use

the group by function so we can always

use

the group by clause

now

here we are taking a different data set

so we will say for example let's look at

head of mt cars

and that is basically my data set on

empty cars now that shows me the model

of the car

it shows me my lathe cylinder power this

and your horsepower and various other

characteristics or variables in this

particular data set

so here

we can say let's do a grouping by gear

so there is a column called gear so i

will call it by gear i will look at my

data set and then what i am using here

which you see with these percentage and

greater symbol is called

piping so that basically

feeds your previous data frame into next

one so this is sometimes useful and you

can get this by just saying control

shift and m and you can then use this so

we are going to have

piping so i am saying empty cars now

this is my original data set where i did

a head

or i could have done a view on this one

if you would want to see it in a more

readable format and that basically shows

me the data so we are using a different

data set so i want to group it by the

gear column so i'm going to call it by

gear

and

this one takes my data that is empty

cars i'm using the piping and then i'm

saying group the data based on gear

column that's done now let's look at the

value of by gear

or

you can always do a view so remember

whenever you're doing a group by it is

giving you a

internal object where your data is

grouped based on a particular column

so we can look at the values here you

can do a view that shows you

your data grouped based on a particular

column

now i can again use the summarize

function

where i would want to now work on the

new one where it was grouped based on

gear so i am doing a summarize and here

i am going to say gear 1 which will be

having the value of summation on the

gear column

and then i am saying gear 2 which is

mean well you could give some meaningful

names to this

and let's look at the value of this one

where we are basically now looking at

the values which is sum

and mean values based on the gear

similarly we can use look at different

example so we can say by gear

and i am again using piping

but earlier we had taken gear

we had grouped the data

and we called it by gear so we took our

original data set empty cars but now

within this particular data which was

grouped by gear

i will take this data set i will use the

piping and i will summarize it where i

am saying within this particular data

set i would want to get the sum or i

would want to get the mean and then you

can look at the values so

what you are doing is

you are

either looking at your original data set

or you're looking at the data which was

already grouped and then you can look at

the values

now here what we can do is we can group

by cylinder say might be you are

interested in looking at data which is

summarized based on the cylinder column

you can do that and then for this by

cylinder i am doing a piping where i am

using the summarize function and

summarizing will then be done based on

the mean values of the gear column or

the horsepower

so let's do this

and then you can basically look at the

value at any point you may want to look

at the data set again so just go ahead

and you can look at what does the value

contain

and

by cylinder or by gear and do a head and

it gives you the value

so

you can always do some summarizing or

grouping in these ways

now here we are going to use sample

underscore n function and sample

underscore

fraction for creating samples

so for this

let's take the flights data set again

and we would want to

get 15 random values now that is done

and it shows me 15 rows with some random

values from the data what you can also

do is you can do a portion of data by

using sample underscore

fraction and here i'll say flights i'll

say 0.4 which will return 40 percent of

the total data so this can be useful

when you are building your machine

learning where you would want to split

your data into training and test might

be you are interested in some portion of

the data so you can do this

which is very useful function and then

you can look at the value of that now

what we can also do is we can use a

range function so like we were doing a

grouping by or we were trying to pull

out a particular column so in the same

way we can use a range which is a

convenient way of sorting than your base

are sorting so for a range function

let's do a view

based on a range so we will work on the

flights data set which we have

and here what we would want to do is we

would want to arrange the flights data

set

which is based on year and departure

time and we are doing a view out of it

so that basically

gives me the data which is arranged

based on

your year and departure time now i can

do a head to give me some highlighting

of that data

now

the piping operator what we are using

can be used in these ways also so here i

will say df i will just assign the data

set empty cars to it let's look at the

df which has basically your different

models you can obviously

look at the head or view of it to look

at useful information we can also go for

nesting options which can be useful

so we are

creating a variable called result here

now that has the arrange function

so what does this arrange function do so

when we would want to use arrange to

sort the data so i would want to sort

the data but what data would i sort so i

will use sample n

which will give me some portion of the

data or some sample data now what is

that sample data so here we are using

nesting that is

earlier when we did a sample we just

said data and how many random samples we

want but instead of giving that what we

are going to do is we are going to use

filter here

now this filter will work on df

so filtering will happen based on the

mileage which is greater than 20

i will say size is 5 and i would want to

basically arrange this in a descending

order so i'm using the des

on this particular mileage column by

default it is always ascending

so let's get the result out of this

which will basically show me the mileage

details in a descending order so this is

my data frame and now

we can look at the result what we have

created

so just do a view or do a head

and look at the view so here you see

mileage

where the highest value is on the top

and we were only interested in five

values in a random sample so that's why

when you did a view it shows your five

values

and it shows in a descending order based

on mileage so we have

not only used an inbuilt function

we have not only arranged the data that

is we have sorted the data but we have

sorted the data based on a descending

order on a particular column we have

said the value should be greater than 20

and we have also said we just need five

random samples

now let's look at some other examples so

you can always do a multi assignment

so i can say filter wherein i am going

to use

df which was assigned empty cars i am

going to say mileage should be greater

than 20

then i say b which is going to get a

sample out of a

and i just want 5 random values so let's

look at that so we have b which is

going to get a

set of 5 values from a

now i will create a result variable

which will arrange b which is sample

data in a descending order now let's

look at the result of this and that

basically shows me what we were seeing

earlier so you can do a multi

where you can create a variable get a

sample out of it and then basically

whatever is that result you can arrange

that or sort that in a descending or by

default ascending order

so same thing we can do it using pipe

operator

so piping so here i will say result

i'm passing in my df that's the data set

i'm using piping and which basically

tells what you need to do on this

particular data set so i'm going to

filter out the data based on mileage 50

sorry mileage 20 then i'm going to push

that

or forward it to

get the random sample and whatever is

this random sample is going to be pushed

so you are arranging this in a

descending order so this is one more way

of doing it and then basically you can

look at the result so these are some

simple examples where you can use your d

plier with multiple assignments or using

your nesting to filter out the data

you can also do a

arrange which is to sort the data you

can get some random samples out of it

you can summarize the data

you can also

summarize the data based on one or two

or multiple columns and you can use some

inbuilt functions to summarize the data

based on some

functions which are applied on the

variables or on the columns

you can transmute it

where you would be interested in only

looking at one column

you can mutate it where you want to add

a new column

you can slice it

and you can give the conditions where

you can say and on or to filter out the

data

so what we can also do is on this

particular data set which we have say

for example df

where i have my data let's look at this

one and if i just do a df at this point

it shows me my data set and if you would

be interested only in particular column

then your d player also allows you to

either we can do a filter or we can

simply do a select

now for selecting we can choose

our data so for example i'll say df

underscore i'm interested in mileage i'm

interested in horsepower

might be i am interested

in

your cylinders in this

and for this one what i can do is when i

would want to do a select

i can basically say

selected

df let's call it some name

i can say

control shift m

which is for piping

and then basically what you can do is

you can do a select

and you can choose your columns so i was

interested in mileage i was interested

in

horsepower

i was interested in cylinder and here

what i'm doing is i'm using a select

where i can look at the new data frame

so let's do this

and

i'm sorry here we will have to give it

df

this is where

you are passing in your data

yeah now this one is done and we can

look at the value of this one by just

doing a df

or

head

on df

underscore

mileage horsepower cylinder and look at

the selected result so you can be

looking at selective columns i could

have done this filter but filter will

always look for

a condition

say your mileage is greater than 20 or

might be your cylinders are more than 4

or something else but when you do a

select you are selecting specific

columns

so view always gives you all the columns

head gives you highlight but then select

can be useful when we are interested in

looking at only specific data so this is

how you can use the plier for

manipulation

for your data transformation for

basically filtering out the data by

selecting particular data and then

working on it so similarly there is one

more package called tidr and we'll see

how we can use data manipulation

done using your tie dr package

let's

learn about that idr package it makes it

easy to tidy your data

and this basically helps you creating a

more cleaner data

so

which is easy to visualize and model now

this comes with mainly four functions so

you have gather which makes

your data wide or it makes white data

longer so that is basically used to

stack up multiple columns you have

spread function which makes long data

wider that is stacking the data together

or stack

if you would want to unstack the data to

data

and you are talking about data which has

same attributes and then your spread can

spread the data across multiple columns

you have separate which is function

which splits single column into multiple

columns

and to complement that you have one more

function which is unite and that

combines multiple columns into single

columns so these are four main functions

which are used in your ti dr package so

let's look how we work with this

so let me bring up my r studio here now

for this

first is let me just clean up my screen

here doing a control l so i will install

the package it is already installed but

we can just do a control enter

and then i can say do you want to

restart r prior to reinstall store

install i'll say okay

and it is basically going to get the

package

now it says package ti

tidy r the rest idr's has been

successfully unpacked let's use that

package

using our library function

and that was built under our version 3.6

now i can basically

start using these functions so for

example here we are creating a data

frame so let's say n is 10

and then we basically would say

we will call it white

now that's the variable name i'm using

the data.frame function

i'm saying id which will be

1 to n so that will take the values from

1 to 10 and then these are the values

which have

10 entries so this is a vector phase one

phase two phase three let's create a

data frame out of it now that's done we

can have a look at our data frame by

just doing a view wide and that shows me

the id column and it has face dot one

face dot two and face dot three now we

can use our function so for example we

can work with gather that is reshaping

the data from wide format to long format

and basically you can say stacking up

multiple columns

so let's see how we do that here i'll

call it long i'm working on white i'm

using the piping

functionality and then i'm using gather

so this one i will say what will be the

data which i will use

so we are using wide as a data frame

then i am saying response time so that

will be basically one more column and

then you have your columns which you

would want to

basically stack so i'm saying from phase

one to phase three so let's do this

and once this is done let's have a look

at our variable long so this one shows

me that i have an id column

i have the response time column and i

have the face column which we mentioned

and that basically has all the values

stacked in so you have face dot one face

dot two and face dot three so if all the

columns are being stacked here so all my

data so now i have totally 30 entries in

this one so this is basically using your

gather function now sometimes we may

want to

use

a separate function now separate

function is basically splitting a single

column

into multiple columns so which we

would want to use when multiple

variables are captured in a single

variable column okay so let's look at an

example of this one so let's say long

separate that's what we will call we

will work on this long which has all the

data stacked in

as the columns we selected then i'm

saying separate i want the face column

and then i would say

when i separate the columns what are my

column names now i could also give a

separator by giving a comma and then

mentioning the separator if that is

required so let's do this

now once this is done let's have a look

at our long separate so what we see here

is the

column which we used so we were doing a

face column and that was to be split and

we wanted to split it into target and

number so that's what we see here so you

have face being split into target and

number and then you have the response

time so this is how you use the separate

function now there is also something

called as unite function which is

basically a complementing of separate

function so it takes multiple columns

and combines the elements to a single

column so for example here

we will

call it long unite

and we will take long separate which was

separating the data we want to unite so

we will take phase target

number

and we want to have a separator between

them so let's basically do this

and now let's look at the result of this

unite

so you see you have the face and target

merged together so you have face dot one

the separator is dot as we have

mentioned and we have united multiple

columns

so this is one more function of your tie

dr which helps you

basically

tidy up your data or put it in a

particular way

now then you have your spread function

and this is basically for unstacking so

that is if you have if you would want to

convert a stack to data or if you would

want to unstack the data which is of

same attributes spread can be used so

that you can spread the data across

multiple columns

so it will take two columns say key and

value and spread it into multiple

columns so it makes long data wider so

we can look at this one we will say long

unite

i'm using the piping i will use the

spread function i'll work on the face

column and response time and let's do

this and then let's do a view on this

so it tells me our data is back in the

shape as it was in the beginning so

these are four functions

which are very helpful when we work with

idr package

so let's learn about

visualization and here we will learn

about

r which can be used for your

visualization now

one thing which we need to understand is

because of our ability to see patterns

which is highly developed we

can understand the data better if we can

visualize it

so the efficient way or effective way to

understand what is in our data or what

we have understood in our data we should

or we can use graphical displays that is

your data visualization so there are

actually two types of data

visualizations so you have exploratory

data visualization which helps us to

understand the data and then you have

explanatory visualization which helps us

to share our understanding with others

so when you talk about r

r provides

various tools and packages to create

data visualizations

and which can be used for both kind of

data analysis or both kind of

visualizations

so when you talk about exploratory data

and visualization the key is to keep all

the potentially relevant details

together

now the objective when we talk about

exploratory data analysis is to

help you see what is in your data

and the main question is how much

details can

we interpret

now when you talk about different

functions which we see here such as plot

which is more for a generic

plotting you have bar plot which is used

to plot data using rectangular bars or

you can say creating bar charts you have

histogram or hist function to create

histograms where you look at the

frequency

of

the data are basically used to look at

the central tendency of the data you

have box plot which is used to represent

data in the form of quartiles you have

gg plot which is a package which enables

the user to create sophisticated

visualizations with the little code

using the grammar of graphics

and then you have plotly or plot ly it

creates interactive

web-based graphs via the open source

javascript graphing library now before

we see some examples here let's also

talk about

when you talk about plotting let's also

try to understand what kind of

plots you can have and what kind of

techniques you have so let me open up my

r studio here

now for example i can pull out a

particular data set

and let's look at this one

so here i can look at

all the panes and that shows me the

information now what i can do is

i can install

and get the inbuilt data sets and then i

can simply do a plot

wherein i am doing a plot on jquery data

set so let's see what does that show it

summarizes the relationship between four

variables in check weight data frame

which is

in our's built-in data set package now

from these plots we can see for example

weight varies systematically over time

you can also see that chicks were

assigned to four different diets

now when we talk about explanatory data

analysis

or visualization that shows others what

we found in the data this means we need

to make some editorial decisions what

features we would want to highlight for

emphasis

what features are distracting or

confusing and you want them to be

eliminated

right so there are different ways of

doing it now when you talk about your

graphics or visualizations you have

i would say

three different types or you can say

four so you have the base graphics which

is easiest to learn now here we are

having an example of base graphics where

i can use the base graphics

i can get a

data set using library

then i can simply create using plot

function to

a generate a simple scatter plot of

calories with sugar

from u.s serial data frame in the mass

package

and then i can give it a title so this

is basically a simple example of base

graphics now you also have what we call

as grid graphics which is powerful set

of modules for building other tools

now you also have latest graphics which

is general purpose system based on grid

graphics and then you have your gg plot

2 which implements grammar of graphics

and is based on grid graphics so you

have different ways now here since i

already have used library and i have the

data set i can just do a x so i can

assign the

sugar related values to x and calories

related value to y

then i can use one more which is library

function and calling in grid now i can

basically use functions such as push

view port if i would want to create a

plot using your grid graphics to create

the similar kind of plot which we

created using base graphics but this

will give you much more power than base

graphics

it will have a steep learning curve but

it is usually useful so i can do this

where i'm saying push view port

then i can basically say i would want to

have a data viewport

i would say different functions of your

grid package so i'm saying rectangle you

have x axis y axis given some points

here

and then basically you can add details

to the graph by giving the names to the

columns

and you can basically create a simple

grid graphics based plot here

now there are different other options

which we can use to create plots now

before we go into understanding how you

create plots let me just give you a

brief on

what are the different kind of plots and

how they can be used so here we will

look at these different plots now for

example

we have a bar chart which is a graph

which shows comparisons across

discrete categories

so you have x axis which will show the

categories being compared and y axis

which represents a measured value

and height of the bars are proportional

to measured values

now

to create different kind of charts you

can use ggplot which is a package for

creating graphs in r

it is basically a method of thinking

about and decomposing complex graphs

into logical subunits and that is a part

of tidy works ecosystem so it takes each

component of graph accesses you can give

scales you can give colors you can give

the objects and you can build graphs on

particular data you can modify each of

those components in a way that's more

flexible and user friendly you can if

you are not providing details for the

components then ggplot will use sensible

defaults

and this basically makes it a powerful

and flexible tool now here

are

different options when you use your

ggplot such as you can use geom or what

we call as geometry objects

to form the basis of different type of

graphs for bar charts you have for line

graphs you have scatter plots that is

underscore point you have underscore box

plot for box plots you have quartile for

continuous x violin for richer display

of distribution and jitter for small

data so here is some simple example

where i would not go into too many

details here but you can just have a

look at this one where we are

using library function to get the

ggplot2 package

then basically we would want to look

into the mileage data we would want to

look at the structure of it

and then we can basically get the tidy

words package finally we can create a

bar chart

using geo underscore bar

and we can basically also mention what

would be in x-axis now you can also give

different colors to basically add more

meaning to your data

you could also go for stacked bar charts

so here we are actually telling ggplot

to map the data in the drive column to

fill the aesthetic so here i am giving

aesthetic access class

and i am saying what is the data we need

to have and then we are using geom

underscore bar

so you can also have dodged bar

in your gg plot that is not bar charts

which are stacked but next to each other

and you can create that by using

your position as position underscore

dodge okay now you can obviously use

your different packages which are

inbuilt and you can create your bar

charts

and you have other kind of graphs such

as line graph which is basically a type

of graph that displays information

as a series of data points connected by

straight line segments such as this one

and for this one we are using if you see

geom underscore line

now you can also create a scatter plot

which is a two dimensional

data visualization that uses points

to graph the values of two different

variables one in an x axis one on y axis

like what we saw in base graphics

example

and they are mainly used if you would

want to assess the relationship or lack

of relationship between two variables

and you also have histogram which i

mentioned is mainly to look at the

distribution of a data to look at the

central tendency of the data

basically looking at

your

large amount of data for a single

variable you would be interested in

saying where is

more data found in terms of frequency

whereas lesser data found in the graph

how close the data is towards its

mid point or what we call as mean median

mode

so you can use histogram where you can

categorize the data in what we call as

bins so these are some basics on

different kind of graphs now we can look

at some examples and see how that works

so what we were seeing is some quick

examples of base graphics or grid

graphics now here

let's do

an example of pie chart for different

products and units sold so you want to

create a graph for this first let's

create a vector and pass in the value

here

now i can also create labels which i

would want to assign to these values

and then basically i can plot the chart

by saying pi so that's the kind of chart

which i would want to create

and i would say the data would be x

and labels

so let's do this and that shows me a

simple pie chart now i can also give

main details here so instead of just

doing a pi x comma labels i can say what

is the main

and then what kind of coloring it should

follow so this is the way you can create

a simple

uh plot now i can also

find out what is the percentage

and

then basically

i would be interested in plotting the

pie chart which takes x

which takes the labels which will be the

percentage which we are calculating here

by doing a round function

and then you can basically give details

to your

graph you can say what color it follows

you can basically look at the legend

where it needs to be

in your chart

what are the values

and then basically fill up the colors so

let's run this one

and that shows me the percentage which

was calculated and it gives me the

details

and we can always have a look at our

plot now if you would want to go for a

3d pie chart then you can get the

package which is plotrix

let's use that by calling in the library

function let's pass in some data to x

and let's give some values or labels

which will make more meaning to the data

and then let's plot the 3d graph so i'm

saying pi 3d here where i'm using x and

labels

then i'm basically doing an explode

which will basically control how your

graph looks like and basically give the

values so it also takes the title when

you say main and by chart of countries

now let's create

data for graph so again we are having a

variable here we are create using the c

function creating a vector

and then let's create a histogram for

this one

where i would say x lab what would be

your data around x axis what is the

color what is the border and here i am

creating a simple histogram

which as i discussed earlier will always

show

your values on the x axis and y axis is

more of frequency and then you can look

at the set of values and what is their

frequency

and we can basically use this histogram

for exploratory data analysis look at

the data try to understand what is the

central tendency of your data values

now we can also give some limits by

using the x lim and ylim and then i can

also specify what is the limit so we

have given some values here wherein we

have said your x limit is 0 to 40

and y limit is 0 to 5. now if you

compare this with the previous one which

we had created

this one

based on the frequency had taken the

limits but we can assign limits

explicitly by giving this and then

create a histogram which makes more

meaning

now let's take

another data set that is air quality

let's view this to see what does that

data contain so you have

ozone solar wind temperature month and

the day so this is the kind of

information we have in the air quality

now let's use the plot function to draw

a scatter plot where as i mentioned you

would be interested in analyzing

variables and see

what is the relationship between them so

to plot a graph between ozone and wind

values

so we will say plot we will say the data

which is air quality from that i would

be interested in the ozone column or

ozone field and the wind field i can

create a plot based on this

now i can also be saying what should be

the color what is the type of the data

which you would want to create and you

can look at the info information so you

can create a histogram you can create a

scatter plot to basically understand the

data better and then infer some

information from that data so let's take

the air quality data set itself without

specifying any particular column and you

can create a plot which shows me all the

different values which you have in the

data and it basically shows you the

difference this is more of an example

like what we did for chickweight where

we did a base graphics now you can

assign labels to the plot so that is

when you are creating a plot you can say

air quality you will say ozone

and then that's your ozone concentration

you have your y lab which is the number

of instances

you have what is the title ozone levels

in new york city what is the color so

these are the details what we have given

with our plot function and let's look at

the data so it just tells me that this

is the ozone concentration

uh the number of instances what you have

and you are looking at the data now we

could also create a histogram by picking

up a particular column that is such as

solar

from your air quality and that basically

shows me the frequency

of solar values and we can then try to

find out what is the mid

what is the mean what is the standard

deviation and so on you can also look at

your histogram and try to understand if

it is left skewed and right skewed so we

can do that

now here let's get the temperature out

from this particular data set

let's create a histogram on temperature

and that basically shows me the

frequency of the temperature values

and

what values have the most frequency or

most occurrence

now you can create a histogram

with

labels

so let's do that with the limit and then

let's also use text to basically given

the values which also takes the values

and for each set of frequency or each

set of values it gives me the labels

now you can have a histogram with

non-uniform width so you could do that

by doing a hist function

and then

passing in your temperature you can say

what will be the main what is the title

what will be your x lab it will tell you

a limit around x axis what is the color

what is the border

what are the breaks you would want to

have

for your bars and you can

simply create a histogram using this so

this basically takes the breaks which we

have given

such as 55 to 60

60 to 70 70 to 75 and so on so this is

basically creating a histogram with

non-uniform width

and it purely depends on the kind of

values what you have

now you can also create a box plot which

sometimes helps us in understanding the

the data quartiles also understanding

our outliers so you can create multiple

box plots based on the data from air

quality so we'll select all the data and

then we'll do some slicing on the data

so let's create a box plot which tells

me the values and if you look at these

points here

like single dots these are basically

your outliers

we can learn about that more in later

sections

so you can use

your gg plot 2 library to analyze

a particular data set so for that we

will first

use the install dot packages and get

ggplot2

so it says do you want to restart r and

i can say yes so let it get the package

i think the package was already there

and now

let's look at

using ggplot2 so for that i have the

library function

and let's do a attach where i'm getting

the data set which is empty cars

now then i will create a variable p1 i

will use ggplot i will pass in my data

i'll give the aesthetics

what is the columns which you would be

interested in

and then you are using geom underscore

box plot to basically create a plot

which gives me the box plot for the

values here and this is based on

the cylinders which is there in your

data

so we can always look at what does our

data contain

and what kind of values or features are

available in the data now let's create a

box plot we will also use the coordinate

function and that basically gives me

based on the data so i have changed the

coordinates now if you

look at the previous one where we

created a plot we had mileage on the y

axis and cylinders

on the x-axis

now i did a coordinate flip and that's

like your transpose function so you have

created the box plot but you have just

flipped the coordinates you can create a

box plot and then say fill

which is the factor of cylinder so that

can be used to fill up the values in

your box plot

now what we can also do is

we can create factors so we have learnt

about factors earlier which is usually

used to work on categorical variables

so here let's create a factor

which is empty cars gear you have am you

have cylinder

and if you look at the factors which we

have created we have passed our data

what is the field or the column we are

interested in

what is the level of values there and

what are the labels for those values

right so we have learnt about factors

you can always look into the previous

section and learn more about factors

now let's create a scatter plot

by using the ggplot function again we

will use the data as empty cars i will

go for mapping option and then i will

give my aesthetics that is what would be

x what would be your y

and you also would want to use what kind

of

function you are using so let's go for

geom pawn point and that basically helps

me in creating a scatter plot now you

can create a scatter plot by factors

so here we will say gg plot

so notice in all of these cases

depending on the kind of data you have

depending on the kind of plot you are

interested in you will use the ggplot

and then basically a function with that

or the inbuilt package so here i'm

saying data is empty cars i am going for

mapping which basically will take the

values for your x and y

what is the color

and the coloring will be done based on

the factor values now if you remember

factors will obviously have some levels

and

[Music]

those levels will basically help you in

differentiating between your categorical

variables so i'm saying as dot factor on

cylinder and then i'm using geom point

to basically create this scatter plot so

let's do this

and

i can

look at the values of this one so it

says

must be there is an error which says

must at least one color from the hue

palette so let's look at that one so the

error which we were facing when we gave

color as the factor values was because

when you look at these factors which

were created with some labels if we look

at the values of these it tells me there

are any values in that particular column

similarly your gear

or similarly you can completely look at

the complete data set it tells me

cylinder you have am you have care now

these have some

we have created some labels but these

have n a values

so what we can do is we can create a

scatter plot as we did earlier by giving

the aesthetics and that's a simple

scatter plot

wherein i'm also using geom point so

that i can have these points by defaults

or with defaults

you can also

give a color specific basically if you

would want to have different kind of

data in the same plot or i can

create scatter plots by different sizes

by giving a size or

i can give a color and size and that's

again one way in which you can create

your scatter plots now let's also see

how you can visualize

one more data set which is mpg

so i can also do it in this way where i

set ggplot2

and then pass and look at the data set

what we have here

you can just do a view on this to see

what my data contains if the fields have

any any values if that's going to affect

your plotting so now what we can do is

we can create a bar plot or a bar chart

so i am saying gg plot the data would be

as we have given in previous lines that

is ggplot2 mpg then i will say what

should be in my aesthetics and what kind

of

chart are you going to create so i'm

saying geom underscore bar so that's my

bar chart and that has basically your

class and count now you can create a

stacked bar chart where your information

is stacked in the same bars

and we are still using the same data

we are going for aesthetics which is

class and then when you say geom bar

which creates your stack bar we will use

fill

which is drive and we can always go back

and look at our data for example

you can always look into this so you

have the drive column here

and you are also working on this

complete data set so let's go ahead and

create a stacked bar chart and that

basically gives me the information where

you have the drive information which is

stacked

here now you can do a dodge

by giving the position as dodge

so we are still going to go for a stack

chart but this time the bars will be

next to each other and that can also be

done which is very useful

you can use this by using geom point

where you are mapping and you are

specifying what are your aesthetics so

we were creating a scatter plot

now you can also use

or give more details where you can say

color can be based on the class

and we have different classes and based

on that my points have been colored

now you can also use a plot

ly or plotly library so let's install

this one

i will say yes for example let it

basically restart so that all my

packages are updated

then i can access that package using

library function

and then

create a variable

to which you are assigning your plot

underscore ly plot so data is empty cars

what will be your x-axis what will be

your y-axis and details on your marker

which we have given

wherein i will give a list which is size

color which is a combination

and then you have your line

what kind of color it will have and what

will be the width so this is where i'm

going to use plot ly

and let's look at this plot

so it basically gives me some

information now we see some warnings

which are getting generated but there is

you don't need to worry about that so

you can look at the packages what you

have

and what options you are using so

similarly we can create one more plot

using plot ly and look at the values of

those so that's a plot with a trend

which explains me about my data

so this is a simple small tutorial on

understanding or

how you can have your graphics or

visualization

used to understand your data obviously

there are much more examples much more

ways in which you can pass into your

plot functions

or your gg plot

and the inbuilt

packages which are available in r for

your visualization now that could be for

exploratory data analysis or explanatory

data analysis so try these graphs and

see if

you can change these options and try or

create new visualizations

now

let's do a hands-on project to perform a

time series analysis using r programming

in this project

we'll be using time series energy data

to explore the variations in electricity

demand and renewable energy supply over

time

over to ajay now welcome to this session

where we will learn on time series

analysis using our programming language

so this is basically a mini project

where we will look at time series data

and how we can analyze it visualize it

to basically find some

important information or gather insights

from the data now when you talk about

time series analysis time series is

basically any data set where your values

are measured

at different points in time

so when you talk about time series data

data is usually

uniformly spaced at a specific frequency

for example hourly weather measurements

you have daily counts of website visits

monthly sales total and so on so when

you talk about time series that can also

be irregularly spaced and sporadic for

example time stamped data in computer

systems event log or history of 9 11

emergency calls

now when we work with time series data

for example here i am taking a energy

data set we can see how techniques such

as time based indexing resampling

rolling windows can help us explore

variations in electricity demand and

renewable energy supply over time now

here we will look at some aspects of

this data set which i am considering so

there is this is open power systems data

set and here is the data set i have we

can look at the data set now this is in

a simple format it has time

it basically has values for consumption

and then you have data for wind and

solar and wind plus solar so in certain

cases you have only the date and the

consumption but then if we scroll down

we will also find

data for wind solar wind plus solar and

so on so this is a time series data set

which we would want to work on

sometimes you may also have the data

collected which just does not have the

time but it may also have

time stamp that is it would have say

hour minutes and seconds and that can

also be worked upon so let's consider

this data set and let's work on this

project where we will analyze this time

series data set

now here we can work on this time series

data we can basically create some data

structures out of it such as data frames

we can do some time based indexing we

can visualize the data we can look at

the seasonality in the data look at some

frequencies and also do some trend

detection

now when you talk about this data set it

has electricity production and

consumption which is reported as daily

totals in gigawatt hours

and here are the columns of the data

which i was just showing you so you have

data you have consumption you have wind

you have solar and wind plus solar so

this is the data we have and we will

basically explore say electricity

consumption and production in germany

which has varied over time so some of

the questions which we can answer here

is when is electricity consumption

typically highest and lowest how do wind

and solar power production vary with

seasons of the year

what are the long-term trends in

electricity consumption solar power and

wind power how do wind and solar power

production compare with electricity

consumption and how has this ratio

changed over time

we can also do wrangling or cleaning of

this data or pre-processing of data and

create a data frame and then we can

visualize this

now let's see how do we do that so i

will open up my rstudio and let's look

at the data set so here is the data set

now i'm picking it up from my machine

you can also pick it up from github so

all the data sets or similar data sets

can be find in my github repository and

here

i can look in the data sets you will

find

a lot of different data sets here there

are some time series data sets such as

power

i can search for power or you have

basically

coal

or you have this

opsd

germany daily data set and there are

many other data sets which you can work

on

now to

get the documentation on this project

you can also look in my github

repository and you can search for

repositories

and then basically you can look in data

science and r

and here there is a project folder where

i have given the documentation sample

data set and also

your time series analysis related

document this is also the code which you

can directly import in your r studio and

you can practice or work on this project

so let's see how does that work

so first thing is we will create a data

frame

from this data set now here if you see i

am using header as true so that it

understands the heading of each column

i'm also giving row.names and i'm

specifying date so there is this date

column in the data set as i showed you

earlier let's look at it again so you

have date consumption wind solar wind

plus solar so you can suggest that date

should become the index column which can

be useful so you can do this now let's

just

create this

let's look at what does this data frame

contain

and here if you see it shows me some

data which

has been

now as a part of this data frame

structure

it starts with consumption wind solar

wind plus solar and if you see this one

is becoming my index column so i can

always do a head and look at part of the

data frame using head or tail so look at

the first records so let's see this now

that shows me the head data i can also

do a tail and look at the

ending values so if you closely see here

we have wind

solar

wind

dot solar and that basically has n a

values so there are missing values but

let's look at the tail and that tells me

that there is some data available for

wind and solar and wind solar

now we can always look in a tabular

format using view

and we can look at the data so this

shows me that there are values in these

columns we see any values but if i

really scroll down

i can see

some values which would be available for

wind and solar and wind solar so i can

just use view now i can look at the

dimensions of this particular object

and that tells me there are 400

4384

rows and four columns you can always

look at the structure that is check the

data type of each column which can be

very useful so if i see here i don't see

the date column because date column was

considered as an index which can be

useful but i also look at my other

columns they are of the num types so

that's the data type for each

attribute or each column here

now we would be interested in looking at

this date column so let's look at the

data type of this date column

now if i try to do this this will show

me that this is null because date as a

column does not exist because we created

it as an index so if i look at row names

and then i search

for my data show me the index column or

row.names it tells me these are the

values that's the date column

which we are seeing here now we can

access a specific row by just doing a my

data

and give the index value or row name

value so let's look at that and that

shows me based on this index you are

looking at the value

you can obviously search for a different

date

something like this you can also pass in

a vector and you can give

range of values so that is 0 1 2006 to 4

of january and we can look at this one

so it shows me

these are the values so here actually

i'm not giving a range but i'm just

selecting multiple values from row.names

now we already know that in r you have a

summary function so you can always do a

summary and that gives you

for each column it gives you minimum

first quartile median mean third

quartile and maximum values so we are

looking at consumption we are looking at

wind solar and wind dot solar

now this is good but then if i would

want to really visualize the data access

the data do some analysis then it would

be good to

take all the columns and then we can

later decide to change the data type of

say date column if we want to use it so

earlier i was using date as row.names or

the name of the rows or index what you

call

in any other programming language so

here i will just use my data set and

i'll say header is true i'm calling it

mydata2 let's look at the data and this

one shows me

five columns where in my first column is

the date

consumption wind solar and so on now

looking at the structure

so let's look at the data type

so it tells me that if now

i'm interested in looking at the date

column from my data to data frame it

tells me it is a factor with four 384

levels and these are the values

so

it is not in a date time format it's a

factor

now what we can do is we can convert

this into a date format how do we do

that so let's have a variable x and i'm

going to use as dot date function and

i'm going to pass in my date column so

that's

assigned to x now let's look at the head

of x and it shows me the values we will

also see what kind of class it is

and we will look at the structure of x

so class already says it is date type

and look at the structure so it shows me

the format

now we have converted this column or

column related value into x now how do i

basically

extract values out of it or make it a

part of data frame so first i will use

so all once it has been converted in

date format i will go for as dot numeric

and here i will create a variable called

year and i will just to a format on x

which is basically of date type and then

i am saying

percentage y so that will get me the ear

component out of this let's look at the

values

that shows me ear component

now similarly we can get the month out

of this and then basically look at the

month values we can get the day out of

it and we can get the day component now

if i look at my data 2 which we had

created earlier this basically had date

consumption wind solar wind solar so

what i can do is i can add these

extracted columns such as year month day

to my data frame using a c byte that is

column bind and i will assign it to my

data to again so let's do this and now

if you look at head it shows me

date so that should be date format

consumption now this one might not be

date format but we'll see you have

consumption wind solar and we have

extracted the year month and day which

can help us for group by we can do some

aggregations we can do a plotting and we

can do various things by these

additional columns now let's look at

first three rows here so i'll say one is

to three for my data two and that shows

me some data here you can always do

ahead and look at the sample of data so

that basically shows me month

day

your columns and then you have your date

now what we can do is we would want to

visualize this data we would want to

basically understand the consumption now

as i said

if we want to visualize the data say for

example i want this which is consumption

of data over years and this one is in

terms of gigawatts per hour as we were

mentioning here gigawatt hours so if i

would want to create this visual to

basically understand the pattern of the

data

how do we do it so we can you create a

line plot of full time series of

germany's

electricity consumption using the plot

method now how do we do that so here

one of the option is i can straight away

use the plot method

i can then say what would be in my

x-axis what would be on my y-axis

what would be the type of

graph i would want to plot what is my

name on x-axis y-axis and this is the

simplest way so i'm saying my data 2 i'm

extracting the year column

and here i'm taking the consumption so

let's create a plot

and here if you see we are looking at a

plot we do see some tick times and we

see that the data has been divided with

every two years so from 2006 onwards to

2016 but then really this data does not

give me

uh you know a very useful way of looking

at the rate or understanding it might be

what i can do is i can use the same way

but i can give apart from x-axis and

y-axis i can say

the

limits that is x limit is 2006 to 2018

and y limit is from 800 to 1700 so we

can do this and let's look at this again

this is a plot but it really does not

help me in visualizing and understanding

the data so what are the better options

i can go for multiple plots in a window

as of now we are just sticking to one

plot in window so if you would want to

have multiple plots you can always

change the value here and make it two or

three that will say how many rows and

how many columns so as of now we will

just keep it as it is bar

mf row now

if i would want to plot i can straight

away give the column name so i am

interested in getting the consumption

now i can just do a plot i'll say

mydata2 and i will choose the second

column which is consumption which we saw

here

from our data so consumption was the

second column so i can just do a plot in

a straightaway way without mentioning

your x-axis y-axis limits and so on and

if you look at this this one is giving

me

a pattern now here i am looking at

uh

x-axis y-axis which is not really named

we do not have a name to this graph

and we are looking at the data it does

show me some kind of pattern but might

be we can make it more meaningful so i

can do it this way where i say my data

second column let's give access as year

x axis y axis is consumption

now that has changed

the x-axis and y-axis now i can also

give some more details i can say type

should be line

i have the line width i'm saying color

is blue

and let's do this so this looks more

meaningful might be shows a wavering

pattern of consumption over years

i can also give a

limit of x that is 0 to 2018 and that

basically shows me the range now we can

change that and we can be more specific

and saying x limit should be 2006 to

2018

and let's look at this now this one once

you have given a proper limit it shows

the line graph and it shows what was the

consumption in 2006 and over a period

till 2018.

i can then

use any of these options are fine but it

depends on what and whom you are

presenting the data or what kind of

analysis you are doing so i can do a

plot i can choose column second x lab

which is x axis

y axis type is line width giving x limit

y limit and then i'm giving a title to

this which is consumption graph

and then basically you are looking at

the line graph

now those are the options which you can

do either you could be very specific or

you could just

give

your column which you want to plot or

obviously make it more meaningful by

giving all the details

now what we can do is if we would want

to look at

this data and understand it better

rather than just looking at a simple

line i can take the log values so here

i'm saying log of

my data to second column so i'm taking

log values of consumption and i'm taking

the difference of logs so i can say

difference and then you can

basically increase or decrease this by

multiplying it by some number so rest

remains the same i'm changing the color

and let's look at this plot and you see

this basically is giving me a better

pattern which makes meaning here we see

the log values so this is you are using

a simple plot function

in r you can also use ggplot now for

that we can install the ggplot package

it's already there in my machine so i'll

say no i will access this by using the

library ggg plot 2

and now i can use ggplot to plot so

the way you specify here you can say

mydata2 that's the data frame

i'm saying type as o and when i'm saying

line

i am basically going to

use x axis which is here y is

consumption and let's look at this plot

so again we are back to the one which we

were doing earlier really does not make

any sense

gives us some data but then really does

not give me enough information

i can

in my aesthetics i can say x is here y

is consumption i can do a grouping and

then i can give line and plot

so again we have some information but

really does not help me right

now let's look at other example so i'm

just doing the same thing here and i'm

looking at line type being tasked i'm

using the gg plots other methods such as

geom line and gm point to give me more

information and if i look at the

plot it does give me data it tells me

what are the different values it gives

me some kind of pattern but i would

still prefer the way we were doing with

plot

now

we can change the color and obviously

add details to it so what we see is when

you use the plot method which i did

earlier it was choosing pretty good tick

locations that is every two years and

labels the years for the x-axis which

was helpful

right but with these data points which

we were seeing here

or say for example this one

or say this one

or say this one we are looking at some

data but then that

really is quite crowded

and it is hard to read you can look at

the values but then it really does not

give you enough information so we can go

for plot method but then we will see how

we can consider different data now if i

would want to plot the solar and wind

time series so let's see how do we do

that

so wind column is what i'm interested in

so first thing is it was always good to

find out the minimum and the maximum

values in every column so i'm saying

minimum i'm saying let's put in here my

data 2

and then let's look at the values so we

are looking at the columns

we know consumption is the second column

wind is the third column

and

you have solar as the fourth and this

one is the fifth so let's say let's find

out the minimum of each of these columns

which we would want to plot so let's say

minimum of data third column and here

i'm also saying remove the n a values

because we do not want to consider the n

a values so let's let look at the

minimum that shows me 5.7757

what is the maximum value it is 826 so

that also helps mean giving a limit if i

want to plot wind on y axis i can give a

y limit from 5 to 850

consumption wise let's find out the

minimum from second column and maximum

and similarly for solar find the minimum

and maximum and wind plus solar minimum

and maximum so this will be helpful when

you would want to plot multiple graphs

or

give some limits so that's fine now for

multiple plots as i said

instead of having one plot let's plot

consumption and wind and solar and try

to see a pattern so i can say par

function and i will say three rows and

one column

so now when i start plotting you will

see you will have multiple plots in one

single window so let's see how we do it

so here

let's look at plot one so this one is

consumption as we did earlier

and let's look at the data so that gives

me some data you can always do a zoom

and you can look at the data you can

basically expand this graph or you can

reduce this graph to see

what kind of pattern we have in

consumption similarly we can basically

choose

date being

x axis

my consumption being y axis right so

this is being more specific because here

we have a range but it really does not

give me enough information so i will

basically give

x-axis y-axis i will give the name that

is daily totals and then i will

basically give consumption color and y

limit based on my minimum and maximum

limits so let's do this

and now we can

look at the data here so let's see this

data

makes a little more meaning because we

are looking at the dates

and let me do a zoom so it shows me all

the dates it shows me the data points it

shows me

how the data

pattern is changing for consumption

now

this is for consumption so what we can

do is we can also extract specific data

so if you see here i have done some

testing where i am saying okay i would

want to

get

a date specifically

i would want to extract some value so we

are looking at the date column but if

you remember we did not change the data

type we just change the data type of

date column we extracted year month out

of it

it would be good if we can

convert a column into date time format

and put that in our data frame now

let's look at the plot2

this is mainly for

your

column

which should be consumption and wind and

solar so here i see it is solar data and

i can plot this one

to see how it looks like

and that tells me from 2006

onwards we have some pattern

i can

be more specific where i say

i would be giving date and then

the

column for solar x-axis y-axis what is

the type what is the y limit and what is

the color

it is always good to specify your x and

y-axis given name rather than let it

automatically pick up now this makes

more meaning because it shows me some

dates

similarly we can do for wind

so either you do it just by giving the

column

or you give your x and y axis so let's

look at this one

and this shows me the data so we can

choose plot three

this one we can choose plot two

we can choose plot one and we can put

all that data in one graph

so that's when you are putting in multi

plots in one particular graph you can

always do a zoom

you can always look at the data right

and this is usually useful to look at

the pattern what kind of pattern we see

what data we have and so on now moving

forward so we have seen how you are

creating these plots all in one window

let me reset this back to one plot per

window

and let's basically plot time series in

a single year so what we have seen is

that when you look at the plot method it

was quite crowded then we looked at

solar and wind and if you compare that

you will see your consumption pattern

your solar pattern your wind pattern and

basically we can see from this

particular data some kind of pattern

so electricity consumption is highest in

the winter

where we will

see what is the consumption

is it highest in winter or is it in

summer we can see that by breaking a

year

further into months we can see that but

we see a pattern which goes for every

year or every two years being highest at

a particular point of time and then it

drops down

so electricity consumption is highest in

winter and that might be due to

electrical heating

and increased lighting usage and lowest

in summer now when you look at

electricity consumption appears to split

into two clusters

we can always look at the consumption

one with oscillation centered roundly

around 1400 gigawatts so you can always

look at 1400 gigawatts and you see all

the values here which are in that

particular consumption another with

fewer and more scattered data points

simply roughed around 1150 so if you

really expand this you can see you will

have lot of data points at this point

now

we might guess that these clusters

correspond with weekdays and weekends

which we can see if you break that data

into yearly monthly weekly and so on now

if you look at solar production

that is highest in summer when sunlight

is most evident and lowest in winter so

obviously when you are making or

gathering some insights when you're

looking at the data you are also using

your domain knowledge your business

knowledge your

you know knowledge of business to

understand how this goes

if you look at wind power production

that's again highest in winters and

drops down in summer

so due to stronger winds and more

frequent storms and lowest in summer

so there is some kind of increasing

trend in wind power production over

years which we can see here

over the years

and

all the time series data what we are

looking at

is

referring or showing us some kind of

seasonality that is we are looking at

seasonality in which a pattern is

repeating again and again at regular

times

at regular intervals so if you look at

consumption solar and wind time series

that oscillates between high and low

values on a yearly time scale which we

can break down and see i'll show you

that

it corresponds with the seasonal changes

in weather over the year

so seasonality

does not have to correspond with

meteorological reasons for example if

you look at retail stale sales data

that will show you yearly seasonality

with increased sales in particular

months

so seasonality when we say can occur on

other time scales so the plots what we

are seeing here

they are fine but if you look at those

plots they might

show some kind of weekly seasonality

also

so in your consumption corresponding to

weekdays and weekend so let's plot for

one single year now how do i do that

so first is i will look at mydata2

that shows me the structure it shows me

date which is factor other columns which

are all numerics

now like we did earlier i'll repeat this

step where i'm going to convert the date

column into date type

look at head of it look at class of it

look at the structure of it right and

then what i want to do is i want to add

this

and to my data frame so i will create a

variable called mod data

and this one will have as data and i'm

formatting

the value of x which is date time into

month day and year so let's do that

and now you look at the mod data which i

created like modified data so this is

the format i have it is in date type if

you carefully see here

and then i can look at the head of it

so it saves me more data

now

we are what we did here is when i said

mydata3

so

mydata3

we did a

c bind and i did a mod data which is

going to add this column to my

other columns of my data 2. so my new

data frame is my data 3 let's look at

the structure of it and you see there is

this date column i can delete it i can

remove it i can let it be right so that

depends on our choice might be we want

to once our analysis done we want to

remove the mod data right so we can keep

both of them

now let's

basically extract data for a particular

year now how do you do that so this is

some wrangling so i will say mydata4

let's call it mydata4 and i will use

subset function so subset will work on

my data 3 that's the data and what i'll

do is i will do a subset how do how is

the subset found so i'll say take the

mod data

column the value should be

greater than or equal to 2017 and should

be less than

2017 december 31st so i'm getting data

for one year and i'm storing it as my

data four

let's get the head of it and you see we

are specifically looking at 2017 related

data

now let's do a plotting of this where i

will only

create a plot for one year so i am

saying my data 4 that's my new

data what we got

so

here i am going to take the first column

which is mod data

i am going to take the third column

which is consumption so i am looking at

the date format for one year consumption

values for it and then rest of the

things as we have done earlier let's

look at the plot and this makes more

meaning right so when you look at this

plot it tells me jan to jan it shows me

some kind of pattern where i have

divided the year into months

right and it is broken down into say two

months so jan and march and may and july

and so on but we still see a pattern and

that gives me good understanding of

pattern where i've broken it down into

months

so this is where you have taken time

series in a single year to investigate

further and this is what we see

right now we can clearly see there are

some weekly

oscillations

what one more interesting feature is

that at this level of granularity that

is when you are looking at yearly data

there is a drastic decrease in

electricity consumption in early january

and late december during the holidays so

probably we can assume that this is

holidays now i can zoom in further and

look at just jan and feb data

let's see how we do that and let's see

how we work by zooming in the data

further

so to zoom in the data further let's see

how we do it now here we have this

mydata4 which is basically having a

subset right so let's work on this one

so i will say mydata4 which earlier i

was taking data 3 i was doing a subset

and i was giving the date but this time

i will make it more

narrower so i'll say my data 4 i will

say subset from my data 3

and i will choose mod data column which

we have modified with the date format

i will choose the starting date as

1701

that is jan and then let's go till feb

and let's create this

now let's look at the head of this so it

shows me we have the data which is jan

and then you you can basically look at

more on this now again as i said earlier

let's find out the minimum of this from

the

first

column so that is basically your mod

data so let's look into this one

and that basically will give me minimum

and maximum let's look at the value so

this one tells me jan 17 january 1

and maximum is

your

feb 28th second month

so we are actually looking at two months

data here

let's look at the y minimum so this is i

will look at

column three now what is column three

consumption so let's look at the minimum

value for consumption maximum value of

consumption let's look at the values

which can be given as our limits

now this is the minimum and maximum now

let's do a plotting for this data which

has been narrowed down

for consumption based on my data so i'm

saying

my first column which is mod data and

then third column which is consumption

i'm giving some

naming convention for sorry namings for

your x-axis y-axis

what is my

consumption

or what is my title here what is the

color and then you see i'm using x limit

to give the minimum and maximum limit

and y limit so let's look at this data

and if you

look at this data

it is specifically for two months and

again i can look at the pattern here

what i can also do is i can add some

grid here

so i can basically look at this data and

make more meaning out of it so it is

bi-weekly data you can see now i can add

a line here using ab line and then i can

basically choose what lines i would want

to add horizontally

so that basically allows me to dissect

the data and look at data in a more

meaningful way i can also

add vertical lines so vertical lines is

i'm saying sequence will be minimum

maximum and i'm saying an interval of

seven

so let's do this

and

this basically has added some lines

every week and you can see at the end of

week it is dropping and then it is

starting again it peaks somewhere in the

mid of the week and again it

drops down so this is you're looking at

your consumption data right now what we

can also do is we can create some box

plots so when we looked at zooming in

data for jan and feb you can add some

data points like this so consumption is

highest on the weekdays as i showed you

here and lowest on the weekends so this

is what we are seeing when we are

breaking the data or zooming it further

for a couple of months so we have

vertical grid lines and we have nicely

formatted tick labels that is jan first

and 15th feb first and so on so we can

easily tell which days are weekdays and

weekends with use of these grid lines

and basically breaking it down so there

are many other ways to actually

visualize your time series data

depending on what patterns you're trying

to explore you can use scatter plots you

can use heat maps you can just use

histograms and so on

now moving further we would want to

explore the seasonality right so when

you further explore the seasonality of

our data

we can use box plots basically to group

the data by different time periods and

display the distribution for each group

now how do we do that

let's come here and let's see how box

plot works so i can just do a simple box

plot and i can choose my consumption

column and that gives me just the

consumption data but this really does

not give me any meaning i can look at

solar data i can look at the wind data

and we can also see some outliers here

so we can create box plots but

if we would want to do a box plot what

is box plot it is basically a visual

display of your phi number summary that

is you want to look at your mean median

you want to look at your 25th percentile

50 percentile

or 75th percentile so we can use a

quantile function use the consumption

column and then you basically give

a vector which shows you find number

summary so that's your quantile and then

let's do a box plot

so if you are looking at quantile it

tells me what is the minimum what is

25th percentile 50 75 100 that's from my

consumption column so let's create a box

plot for consumption

let's give it a name as consumption

let's give y axis as consumption and a

limit

for

y-axis

now that's my consumption graph

so i can look at yearly data now that

will make more meaning rather than just

looking at the complete consumption data

so how do we do it early so we will say

consumption

and then i will say the year column so

it is consumption but grouped based on

year

so here i can give x axis y axis and i

can give y limit so let's create this

and this makes more meaning we can give

some coloring scheme here but now i'm

looking at 2006 2007 8 9 and so on and

we can look at the data what is the

range right it gives me five percentile

or sorry five number summary of the data

per year and it basically allows me to

look at the seasonality of this

similarly we can create box plot

by just giving consumption early group

and here i am giving the title as

consumption y axis

x axis and y limit

wherein i can also use lass so this is

one more feature which you can do and

that basically will give me the tick

points if you compare this one to the

previous graph

so when i created this previous graph i

had 2006 2008 and i had from 600 to 1800

and if i go for the next one

i am basically seeing more useful

information now let's look at monthly

data

so

i would want to group it based on months

and let's create that so this gives me

the monthly data where i'm looking at

months

and i could select

a particular year or i can just do a

grouping based on months

so

i can have multiple plots to see a

difference here so let's do this

now let's create a box plot for

consumption which is monthly data and

let's give it a color

let's look at the wind data which is

again grouped monthly and let's look at

the solar data which is grouped monthly

now if i zoom in it basically gives me

the seasonality of the data

for your wind for your consumption for

your solar so what we are doing is we

are creating these box plots which are

giving us

values now what i can also do is i could

look at the day wise also but before we

look into this how do i

infer some information from these box

plots which are being created so this is

what we have done where we are looking

at the data for month and these box

plots give me ear seasonality

which we were seeing in earlier plots

but give some additional insights so if

i look at the data here it tells me the

electricity consumption is generally

higher in winter

now this is based on months so we can

see consumption is higher in winters

and lower in summer so we can obviously

look at our plot we can see where it is

lower where it is higher

and then we can

look at the median and lower two

quartiles are lower in december and

january compared to november and

february so that is you look at the

quartiles and you will see

that

the median and lower two quartiles are

lower in december and january

here jan and december so you can look at

from my plot

now

this is giving you some idea on

seasonality

now

that might be due to business being

closed over holidays now this one we

were also seeing when we looked at time

series for 2017 only and box plot

basically confirms that there is this

consistent pattern throughout the years

now when you look at

your

solar and wind power production both

will give you a year seasonality what we

are seeing here

and

if basically i look at the data so

it depends on what parameters you are

choosing but if you look at solar it

will reflect the effect of occasional

extreme wind speeds associated with

storms and other transient and since we

are grouping it based on months we can

see this pattern is quite evident every

year

now what we can do is we can group the

data day wise so here let me again reset

this to

one plot per graph

now i'll say box plot i'll say

consumption which is group based on day

now we know that there is a day column

and let's give a while limit and let's

look at the data so this is where i'm

grouping the data day wise

so you look at 31 days and you look at

the box plot so this is where you are

plotting it on a daily basis so you can

look at the data you can break it down

to

a particular week so here i have given

a day and i have chosen all the 31 days

but i can break it down to a week and i

can look at the data so

if we look at the data per week or per

day we can basically infer that

electricity consumption

where i'm doing a consumption group by

day

is higher on weekdays than on weekends

so time series with strong seasonality

can

often be represented with models that

can decompose signal into seasonality

and long trend now this is

an easy way now how do we look at the

frequency of the data that could be

interesting to see

so let me

look at

say the yearly data

which we were seeing here

now let's go further and here

we have looked at data so what we will

do is we look at the frequency now when

you look at the frequency when you talk

about frequency in your data so we have

the modified date column which gives me

a frequency and if we really look into

the data that will tell me

that the data is on a daily basis so for

that let's look at my data three again

which gives me data and you can just see

all the data's data or dates are in

sequence so you're 22 23 24 25 26 and so

on i can look at i can access a d player

package

that is basically

allowing me to work in a better way now

i can look at the summary of this and

for all my columns

i am seeing what is the minimum phi

number summary date and consumption so

date does not show me anything because

this is not in a date format it is just

a factor but other things have the fine

number summary so we are looking at wind

plus solar we are looking at year and

month and day and all these columns

now what we will do is we will want to

find out the sum of each

column how many entries does it have

and we will say the value should n a

value should not be considered so let's

look at this one so it tells me for my

particular columns

so let me run this again

and that shows me

for each column how many values you have

and

these

counts

do not include the n a values

now similarly i can find out

specifically for consumption i can find

out is there any n a value so i'm saying

is dot n a and let's find out if there

is any n a value or missing value in

consumption it says zero

okay that's good if you look in wind

it tells me there are 1463

entries which are any

similarly solar

similarly

wind dot solar or wind plus solar so it

gives me a count of n a values that is

missing values

and also values which are not missing so

to understand frequency what we can do

is we can find out the minimum

on the date that is the first column and

i'm saying

rm

n a dot rm is true that is get rid of n

a values and find out the minimum

and let's look at the minimum value

this is the minimum from my modified

date

now if i would want to get the frequency

i can basically use sequence function so

i can say

from x minimum that is the minimum value

i want to look at the frequency that is

day wise and let's just look at five

entries and see if there is a

day by day frequency

so let's look at the value of this and

obviously it tells me there is device

frequency so that allows me to look at

the frequency look at the type of it it

is an integer class is a date

so similarly we can say from x minimum

we can basically look at the frequency

month-wise

and i can again look at five records so

that shows me monthly data

right so i can

extract the data for frequency similarly

yearly data and that's also very useful

now

we can select data which has n a values

for wind

so how do i do it i would want to find

out

the wind column and i want to find out

where the values are and a so

i will create a variable

and here i will say my data 3

and then i give a conditional where i

say is n a

in the column so let's do this

now once i have done this

once i have done this i have said that

my

selected wind data from my data 3 where

we said any values

and i will give the names to this so

name should be in my theta 3 i'm

interested in mod data consumption wind

and solar so these are the four columns

i'm interested in let's look at first 10

records here or first 10 rows so that

tells me these are the values where wind

has n a

or missing values

i can always do a view and that gives me

the complete data so it basically shows

me 1463

entries and here it shows me all n a

values so you can look at all the way to

the end and it shows me wind has n a

solar does have some value here

in the last row but then also if you see

the numbers have

a difference so you have 1 4 6 1 and

then you have 2 1 7 4 so there is a

difference so there is some data in

between where wind has some values so we

have found out any values now

what we will do is we will select data

which does not have any values

so i will call it cell selected win2

i'll again use mydata3 i will say which

but now i am saying not any

from this column and i will select the

data for the columns so i'm interested

in looking at

10 records and this shows me not any

value so no more missing values so if i

really look at this data as i saw

earlier which has n a and if i look at

these values which are not any for the

wind column so looking at these two

result we will know that in year 2011

wind column

has some missing values

so let's focus on year 2011. so how do i

do that let's call it a different

variable i'll say mydata3 i will say

here when i say which where we were

saying n a here i will say the year

should have a value of 2011 and i want

all these columns

let's look at the data here and this is

showing me 2011 but

we

are not seeing all the values so there

are some values but then there are some

missing values also for 2011 based on

whatever analysis we have done so let's

look at the class of this it is

basically a data frame do a view

and this one will help me in finding out

where are the any values so if you just

scroll down

looking at all the data let's search if

wind column has a n a or a missing value

and i will see

if there is any missing value in which

column or which row it is for the wind

column so we have all the values which

are existing

i could select and search for one

specific value and i'll show you how we

can do that so here let's scroll all the

way down so it's like you're exploring

your data and seeing is

wind column having n a or missing value

for a particular row

and let's scroll here and here you see

there is a missing value for one

particular row so

13th

december 2011 has wind value

15 december has wind value but

your 14th december does not have right

similarly we can search so there was

only one entry which was missing now

that could be for some reason might be

it was not calculated might be it was

not tabulated so we have a missing value

and that

can affect my plotting that can affect

my analysis so let's look at the number

of rows

in this which will tell me how many rows

we have

for 2011. so it tells me 365. so that is

basically the number of days in a year

now we will find out if

there were any values so we earlier

checked total number of na values per

column

that is

in your row number 265 to 269

we can see here 265 to 269

so this is where we were seeing

are there any n a values right so let's

go back here

and

we want to find out the number of n a

values for a particular year how do i do

it so i can just do a sum i will say is

n a

now i am interested in my data 3

wind column and i am saying my year has

to be 2011 but i am finding out the n a

values

so let's do this and it tells me one and

that's right that's what we saw when we

did a view let's see

how many non-na values you have and that

is 364 so that basically

satisfies my logic so it's 364 plus 1

missing so there are 365 let's look at

the structure of this it tells me you

have modified date and date format you

have consumption wind and solar now

let's create a variable

selected wind4

i will save in three that is which was

having all my n a and

non n a values for 2011. i will say

let's find out the n a value

because i'm interested in finding out

that particular row so i'm saying find

out where the value is n a and i want

all the columns

let's look at this one and this is my

specific

row which has a n a value

now

we know that data follows a device

frequency which we have clearly seen now

let's select data which has n a and non

n na values

so

let's say let's call it test one i will

use win3 which has

any non-n a values but now i will say

i want the modified date which should be

greater than 12 12 2001 now remember we

had when we were doing a view we saw

that one particular day or what we see

here 14th of december there is no date

so i will select a subset of data which

includes this n a and non n a that is

might be i can take 13th of december and

15th of december so let's start from 12

12

so the date should be greater than 12 12

that means 13 and it should be less than

16 so that is 15th

and the columns

right so

now we have some data let's look at this

so i have

a i've selected a subset of data i could

have done this using subset also so i

have any and non-any values now

why are we doing this so sometimes you

might have some data for a particular

column and you may want to find out if

there are any missing values might be

you want to fill them up or replace them

with something so that is usually useful

when you are doing a trend detection

so say for example you have data for

every month and might be in one one of

the months you have missed or might be

you have data for every year collected

monthly and then in one of the years for

couple of months you don't have the data

like i can say 2016 i have data for all

12 months 2017 all 12 months 2018 might

be i don't have data from march and june

2019 i don't have data for same months

so i can forward fill or backward fill

them using the previous year's same

month data so we can do that so here i

have test data where i've extracted a

subset of data

i can look at

the

class of this it is a data frame

structure of this it has the columns now

let's use that library

and

function and use the tidy r package

and what we will do is we will fill it

up so i will use test one i will fill

the wind column which has a missing

value now once you do this if you notice

it has done a forward fill so it has

taken the previous value and it has just

filled up that so you can

fill up the data using different

directions such as up and down

left and right and so on so we can take

care of missing values

in our frequency data which allows us to

basically

analyze the data in a better way now

here we will want to also look at some

more data so this is to deal with

frequencies of fill column

wherein you can take care of missing

values forward filled so filling values

can be done in different directions as i

said and you may want to first convert

your time series to specified frequency

if

your data does not have a frequency but

we had now if you do not have a

frequency might be you can convert it

into a frequency such as weekly daily

monthly as i showed you and then

basically you can

do a forward fill

for the value so for example if i have

my data i can break it down into weekly

and then look at the values and if there

are any values missing for weekly data i

can use a forward fill so that can take

care of my frequency data

then

let's look at the trends of the data

which is the last part of this project

so basically let's look at the trend so

when you say trend what does that mean

so in time series data

you always have some kind of trend

so that will exhibit some slow gradual

variability in addition to

higher frequency variability such as

seasonality and noise

now

to visualize these trends what we do is

we use what we call as rolling means so

we know how our data is

spread over year or month or day

but how about looking at a rolling

average and see what is the difference

so a rolling mean

will tend to smooth a time series by

averaging out the variations and

frequencies

so this can be higher than the window

size so there is something called as

windowing where you can choose a set of

time frame you can also average out any

seasonality on a time scale equal to

window size

so this will allow you to look at lower

frequency variation in the data

so when we are looking at electricity

consumption time series we already saw

there is a weekly pattern there is a

yearly seasonality which we saw using

box plots so we can also look at the

rolling means of the time scales how do

we do that so for this you can use some

package like zoo and then you can

basically use a rolling mean

using this zoo package

and you can say what

is the

frequency with which you want to

calculate the rolling mean

now how do we do this

let's look at this data so here i'm

going to my look at my data 3 which we

have been using so far

now let's call it a 3 day test you can

give it any name i am going to use my

data 3 i am using the pipe in function

now i will use d plier and i will

arrange the data descending in here now

you can always break it down step by

step and you can see the result of this

so i'm going to arrange this data in

descending order of year

so obviously my last one 2017 or 2018

will be on the top you want to group the

data by year so it depends on how many

years we have we'll see so you can group

the data by year now this data is then

used to basically mutate so mutate

function is going to allow me to use

this rolling mean so i'll call it as

says 0 3

day so i'm going to calculate a rolling

mean every three days

for my consumption column

and

basically let's ungroup this so let's

see how this

works sorry yeah let's look at this and

here when i'm doing a three-day test

let's look at the result of this and

then i'll explain this so if you see

here we have the test three-day column

now this has the rolling average now

what does that mean so first value here

what we see is 1367

is the average consumption in 2017

from the first date with the data point

on either side of it that is you can

look at

this

date so one one three zero

then you look at

you are looking at the value one three

six seven here so you look at one one

three zero 4 4 1 1 5 3 0 if i take a

mean of these so for example if i would

just do this part

and that

is giving me

mean okay because i have a comment so

let's basically add anything as comment

and then let's do this so it saves me

one three six seven that's what we are

seeing here right so you've got getting

a rolling average every three days

similarly if you want every five days it

takes the five values and it gets the

mid value right so you can always find

out the mean

rolling mean

for a particular frequency now let's do

that for seven days that is weekly data

and yearly data that is 365 days so how

do i do it same logic my data test

now i am using my data 3 i am arranging

it in a descending order i am grouping

by year

so when you do a group by year so

earlier when we did a grouping by and

when we looked at the data it was

telling me how many rows we had

right so let's do a grouping by year and

let's say test zero seven so that's a

rolling average every seven days and i'm

also saying take care of the n a values

similarly i'm getting rolling average

every 365 days might be you can do

quarterly might be you can do half

yearly and let's do this so let's

create this my data test and let's look

at the result of this so i will use my

data test i will say arrange

based on modified date now we know there

is a column called modified date i want

to just look at 2017 data so i'm doing a

filter

right and then i will choose what are

the columns i'm interested in so i will

look at the 7 and 365 day and let's look

at say first seven records so let's do

this

and that basically gives me the

consumption value modified date year and

my rolling seven day average order of

seven day mean

which is for first seven days and then

365 you will not see the data here but

if i do a view on this i can basically

see the values

so you can always select a particular

column to see the values

these are the values for every 7 day

rolling average

this is for 365 days every 365 days so

you see all the values are missing but

every 365 entry you will have basically

some data

now let's do a plotting of this and

basically visualize this data which we

are seeing rolling average so let me

first do a plotting one plot per graph

and let's do a plotting i will take

consumption data

x-axis y-axis

color and give a title to this so let's

create this and that's my consumption

data which is

spread over a period of time and that's

fair enough but now let's add some more

plot to this so i will add the seven day

rolling average to this

so for second plot to be added in the

same one in r you can use points

so i will say points i will choose seven

data column

type is line width

x limit y limit and color so let's do

this

and that's my

pattern seven day rolling average which

basically gives me some kind of trend

similarly i can add one more here and

this time i will choose the 365 day

and look at the pattern

lines so now you see some dots here well

you could do it in a different way so i

can just add legend to this and i can

say legend will be

where in x axis and y axis so i am

saying it will be 2500

and y is 1800 so my legend will come in

somewhere in here i am saying my legend

will have consumption

test

and this one i can give some names i can

give what is the color

i can say what kind of

legend it explains what is

for each color and then basically a

vector so let's add a legend to this and

i've added a legend now you can do a

zoom and look at the data

and

here i see that

my x axis is fine but y axis is going a

little

about of my plotting area so i can

actually change that so here i have 1800

how about making it 1600 and let's look

at this one

so

we can basically

uh go for this one and start again here

plot and points and line and then add a

legend right and you can basically place

your legend anywhere in the plot so this

basically is giving

me the trend what i'm looking at my

rolling average

so similarly you can look at the trend

for wind and solar data so what we are

seeing here is when you look at trend

this is one more way of looking at it

you can always create plots in different

ways

so

seven day rolling mean has smoothed out

all weekly seasonality which we were

seeing here in my graph where you look

at every seventh day preserving the

yearly seasonality so

seven day will tell

that electricity consumption is

typically higher in winter and lower in

summer so better is you break it down

yearly so here if you look at every year

you can see when is winter when is

summer what is the seasonality what your

trend what you are seeing here and if

there is a decrease or increase

for few weeks

every winter

so similarly if you look at 365 now as

you said as i said rolling average

basically

reduces the variation so if i look at

365 rolling mean we can see long term

trend

in electricity consumption is pretty

flat now that's what we are seeing it's

kind of pretty flat there is not much

variation over ears if you really join

these dots

so

we can basically see some highs and lows

and that gives me a trend now this is

how you can do a trend detection and

similarly we can do plotting for wind

and solar so this is a

small project which i demonstrated using

r

now all this code which you have here in

the form of a project dot r file you can

find here in my github page this is a

document which explains some things feel

free to download this and you can add

details to it this is the sample data

set which you can also find in my

repository in the data sets folder so

continue learning and continue

practicing r with that we have come to

the end of this full course on our

programming think we missed anything

important do let us know in the comment

section below thank you so much for

being here and do watch out for more

videos from us until then keep learning

and stay tuned to simply learn

hi there if you like this video

subscribe to the simply learn youtube

channel and click here to watch similar

videos to nerd up and get certified

click here

you

Resources:

Similar videos

2CUTURL

Created in 2013, 2CUTURL has been on the forefront of entertainment and breaking news. Our editorial staff delivers high quality articles, video, documentary and live along with multi-platform content.

© 2CUTURL. All Rights Reserved.