Today, we are living in a world where access to large amount of information is effortless with the help of high-tech. The overloading data can lead to misinterpretation and delay in making decisions. Therefore, a need for learning data visualization and analysis has emerged to structure and manage data. As we can observe in case of COVID-19, scientists struggled with exceedingly large information of patients.
Here, we aim to create plots of the COVID-19 data retrieved from
https://data.europa.eu/euodp/en/data/dataset/covid-19-coronavirus-data/resource/55e8f966-d5c8-438e-85bc-c7a5a26f4863
The plots generated does not provide facts and figures but a direction to practice and master skills.
Basic Steps for Getting Started
Step#1 Install Packages
Step#2 Upload Library
Step#3 Import File
Step#4 Filter data according to conditions(if any) for the plots
Step#5 Aesthetic Mapping : Specify the x and y- axis (x = cases, y = deaths)
Step#6 Add geometries of "ggplot2"
geom_point for scatter plots
geom_histogram for histograms
geom_boxplot to create boxplots
Step#7 Customize titles at x-axis, y-axis and main title
Step#8 Apply theme as per choice
Step #1 & 2 : Installing Packages and Uploading Libraries
Before giving any command to R, we must know the purpose of packages and libraries that we are using. The packages need to be installed only once, however, we have to upload the library each time we restart R studio.
install.packages("ggplot2") #gg stands for grammar of graphics.
library("ggplot2") #It helps in creating plot/graphics
install.packages("dplyr") #helps manipulate the data.
library("dplyr") #select(), mutate() , filter(), reorder() are some of its functions.
install.packages("dslabs") #dslabs stands for data science labs
library("dslabs") #consists of built-in datasets that can be used for data analysis.
install.packages("RColorBrewer")
library("RColorBrewer") #provides color palettes and color schemes
install.packages("ggthemes")
library("ggthemes") #consists of different themes to customize plots
install.packages("lubriate")
library("lubridate") #helps manipulate data consisting of dates and time
Step #3 : Importing Data File
The files can be uploaded by going to files and clicking import dataset :
![](https://static.wixstatic.com/media/a27d24_2634bd50409e43aaa8926d4003044fc1~mv2.png/v1/fill/w_851,h_661,al_c,q_90,enc_auto/a27d24_2634bd50409e43aaa8926d4003044fc1~mv2.png)
or you can simply type :
library("readxl")
COVID-19 <- read_excel("COVID-19.xlsx") #where COVID-19 is the file name that can be renamed
Get Going with Visuals (Steps# 4, 5, 6 , 7 )
1- Scatter Plot
library("ggplot2") #to create plot
library("dplyr") #because we are using its filter() function
COVID-19%>% #type file name followed by piping/ attach
filter(location == "Pakistan")%>% #location is the column name for
different countries
ggplot(aes(x= cases, y = deaths, color = "Red"))+
geom_point (size = 1) + #geom_point to plot scatter plots
xlab("Cases")+ #customize title of x-axis
ylab("Deaths")+ #customize y-axis
ggtitle("Covid-19 Statistics") #main title of plot
![](https://static.wixstatic.com/media/56e222_fb8586d7d9b7410db556529b265e399a~mv2.png/v1/fill/w_704,h_439,al_c,q_85,enc_auto/56e222_fb8586d7d9b7410db556529b265e399a~mv2.png)
2- Barplot
library("ggplot2")
library("dplyr")
library("RColorBrewer") #provides with color palette
COVID-19%>%
filter(location %in% c("Australia", "France", "Italy", "Spain", "Japan", "Canada","China", "Pakistan", "Belgium", "Germany", "Afghanistan", "Bangladesh"))%>% #%in% used to include various items
ggplot(aes(x= redorder(location,deaths), y = deaths, fill = location))+
#the function reorder is used to order the locations on basis of deaths
geom_bar(stat = "identity")+ #geom_bar for bar-plot
coord_flip()+
xlab("Location")+
ylab("Deaths")+
ggtitle("Covid-19 Statistics")
![](https://static.wixstatic.com/media/56e222_397da064386744d0a4ac75a2bb33931a~mv2.png/v1/fill/w_704,h_439,al_c,q_85,enc_auto/56e222_397da064386744d0a4ac75a2bb33931a~mv2.png)
3- Box Plot
library("ggplot2")
library("dplyr")
COVID-19%>%
filter (Continent %in% c("Asia", "America", "Europe"))%>%
ggplot(aes(x= Continent, y = deaths, color = Continent))+
scale_y_log10()+ #this function transforms the y-axis scale to
log scale i.e base 10
geom_boxplot(size = 1)+ #geom_boxplot for boxplots
xlab("Continents")+
ylab("Deaths")+
ggtitle("Covid-19 Statistics")
![](https://static.wixstatic.com/media/56e222_77aab6203b504bf0b20d6136ed73dbe1~mv2.png/v1/fill/w_704,h_439,al_c,q_85,enc_auto/56e222_77aab6203b504bf0b20d6136ed73dbe1~mv2.png)
4- Line Graph
library("ggplot2")
library("dplyr")
library("RColorBrewer")
library("ggthemes") #to customize the plot theme
COVID-19$dateRep #dateRep is the column name with dates for
reported deaths and cases while the $ sign
extracts the data in this column
as.character(COVID-19$dateRep) #as.character coverts the dates from
numeric to character form
COVID-19%>%
filter(location %in% c("Spain", "Italy"))%>%
ggplot(aes(x= dateRep, y = deaths, color = location))+
geom_line(size = 1)+
xlab("Months")+
ylab("Deaths")+
ggtitle("Covid-19 Statistics")+
theme_bw() #provides plot with white background
![](https://static.wixstatic.com/media/56e222_7243655e6b3347f380dd598a57394158~mv2.png/v1/fill/w_704,h_439,al_c,q_85,enc_auto/56e222_7243655e6b3347f380dd598a57394158~mv2.png)
5- Histogram
library("ggplot2")
library("dplyr")
COVID-19%>%
filter(location == "China")%>%
ggplot(aes(x = cases))+
scale_x_log10()+ #converts x-axis scale to log scale
scale_y_log10()+ #converts y-axis scale to log scale
geom_histogram(bins = 30, color = "black, fill = "lightblue")+
xlab("Cases")+
ylab("Count")+
ggtitle("Covid-19 Cases in China")
![](https://static.wixstatic.com/media/56e222_4da315840b254f799da6ad1f21e05395~mv2.png/v1/fill/w_704,h_439,al_c,q_85,enc_auto/56e222_4da315840b254f799da6ad1f21e05395~mv2.png)
Comentarios