top of page

Getting Started with Data Visualization using R studio

Writer's picture: Fortune FornaxFortune Fornax

Updated: Jun 5, 2020

Today, we are living in a world where access to large amount of information is effortless with the help of high-tech. The overloading data can lead to misinterpretation and delay in making decisions. Therefore, a need for learning data visualization and analysis has emerged to structure and manage data. As we can observe in case of COVID-19, scientists struggled with exceedingly large information of patients.


Here, we aim to create plots of the COVID-19 data retrieved from


The plots generated does not provide facts and figures but a direction to practice and master skills.



Basic Steps for Getting Started

Step#1 Install Packages

Step#2 Upload Library

Step#3 Import File

Step#4 Filter data according to conditions(if any) for the plots

Step#5 Aesthetic Mapping : Specify the x and y- axis (x = cases, y = deaths)

Step#6 Add geometries of "ggplot2"

geom_point for scatter plots

geom_histogram for histograms

geom_boxplot to create boxplots

Step#7 Customize titles at x-axis, y-axis and main title

Step#8 Apply theme as per choice



Step #1 & 2 : Installing Packages and Uploading Libraries


Before giving any command to R, we must know the purpose of packages and libraries that we are using. The packages need to be installed only once, however, we have to upload the library each time we restart R studio.


install.packages("ggplot2") #gg stands for grammar of graphics.

library("ggplot2") #It helps in creating plot/graphics


install.packages("dplyr") #helps manipulate the data.

library("dplyr") #select(), mutate() , filter(), reorder() are some of its functions.


install.packages("dslabs") #dslabs stands for data science labs

library("dslabs") #consists of built-in datasets that can be used for data analysis.


install.packages("RColorBrewer")

library("RColorBrewer") #provides color palettes and color schemes


install.packages("ggthemes")

library("ggthemes") #consists of different themes to customize plots


install.packages("lubriate")

library("lubridate") #helps manipulate data consisting of dates and time



Step #3 : Importing Data File


The files can be uploaded by going to files and clicking import dataset :



or you can simply type :

library("readxl")

COVID-19 <- read_excel("COVID-19.xlsx") #where COVID-19 is the file name that can be renamed



Get Going with Visuals (Steps# 4, 5, 6 , 7 )


1- Scatter Plot


library("ggplot2")          #to create plot
library("dplyr")            #because we are using its filter() function
COVID-19%>%                 #type file name followed by piping/ attach

filter(location == "Pakistan")%>%  #location is the column name for 
                                   different countries
                    
ggplot(aes(x= cases, y = deaths, color = "Red"))+          
geom_point (size = 1) +          #geom_point to plot scatter plots
xlab("Cases")+                   #customize title of x-axis
ylab("Deaths")+                  #customize y-axis 
ggtitle("Covid-19 Statistics")   #main title of plot




2- Barplot



library("ggplot2")   
library("dplyr")   
library("RColorBrewer")       #provides with color palette 
COVID-19%>%

filter(location %in% c("Australia", "France", "Italy", "Spain", "Japan", "Canada","China", "Pakistan", "Belgium", "Germany", "Afghanistan", "Bangladesh"))%>%   #%in%  used to include various items

ggplot(aes(x= redorder(location,deaths), y = deaths, fill = location))+
#the function reorder is used to order the locations on basis of deaths

geom_bar(stat = "identity")+       #geom_bar for bar-plot
coord_flip()+
xlab("Location")+
ylab("Deaths")+
ggtitle("Covid-19 Statistics")




3- Box Plot


library("ggplot2")   
library("dplyr")
COVID-19%>%
filter (Continent %in% c("Asia", "America", "Europe"))%>%
ggplot(aes(x= Continent, y = deaths, color = Continent))+

scale_y_log10()+         #this function transforms the y-axis scale to 
                          log scale i.e base 10
                          
geom_boxplot(size = 1)+  #geom_boxplot for boxplots    
xlab("Continents")+
ylab("Deaths")+
ggtitle("Covid-19 Statistics")                      




4- Line Graph



library("ggplot2")   
library("dplyr")   
library("RColorBrewer") 
library("ggthemes")      #to customize the plot theme

COVID-19$dateRep         #dateRep is the column name with dates for 
                          reported deaths and cases while the $ sign 
                          extracts the data in this column
                          
as.character(COVID-19$dateRep)  #as.character coverts the dates from 
                                 numeric to character form      
COVID-19%>%
filter(location %in% c("Spain", "Italy"))%>%
ggplot(aes(x= dateRep, y = deaths, color = location))+
geom_line(size = 1)+
xlab("Months")+
ylab("Deaths")+
ggtitle("Covid-19 Statistics")+
theme_bw()                     #provides plot with white background                              




5- Histogram



library("ggplot2")   
library("dplyr") 
COVID-19%>%
filter(location == "China")%>%
ggplot(aes(x = cases))+
scale_x_log10()+             #converts x-axis scale to log scale
scale_y_log10()+             #converts y-axis scale to log scale

geom_histogram(bins = 30, color = "black, fill = "lightblue")+
xlab("Cases")+
ylab("Count")+
ggtitle("Covid-19 Cases in China")























18 views0 comments

Recent Posts

See All

Comentarios


Post: Blog2_Post
bottom of page