top of page

DPLYR - Data manipulation in R studio

Writer's picture: Fortune FornaxFortune Fornax

DPLYR is a package in R language that uses grammar of data manipulation. It makes access

to information in a dataset easier and faster. It provides with a set of verbs or functions used to transform data. Here, we will discuss some of its key functions using covid 19 dataset:




  • filter( ) #filters data and picks columns based on values

  • mutate( ) #adds new columns or variables

  • select( ) #selects variables based on their names

  • summarize( ) #reduces values / sum up

  • group_by( ) #used to summarize columns by forming groups



Examples


filter( )

#To filter the data with information for only one country
library("dplyr")
COVID_19%>%        #%>% is pipe operator
filter(location == "Pakistan")    


mutate( )

#In order to calculate individuals that are alive we subtracted deaths from cases and created a new column
Alive_data = COVID_19%>%
mutate(Alive_individulas = cases-deaths)
Alive_data


select( )

#Here, we intend to select only three columns from the data i.e location, cases and deaths to form a new dataset
Data_location_cases_deaths = COVID_19%>%
select(location, cases, deaths)  
Data_location_cases_deaths


summarize( )

Total_cases = summarize(COVID_19, Total_cases = sum(cases))
Total_cases


group_by( )

Total_deaths = COVID_19%>%        
group_by(location)%>%
summarize(Total_deaths = sum(deaths))
Total_deaths

6 views0 comments

Recent Posts

See All

Comments


Post: Blog2_Post
bottom of page