Time Series Data Analysis – 1

A time series data are sequence of data which were listed in time order. Example of time series data include daily temperature, precipitation for a year or historical data showing several years month wise or day wise. Time series are mostly used in econometric, mathematical finance, weather forecasting etc. Time series analysis and forecasting uses different statistics and modelling to extract meaningful information or to predict future value based on the data. A time series data can be of, 1. Regular time series : When data have specific intervals between each observations. 2. Irregular time series : when there is no fixed intervals between the observations.

loading the library

library(ggplot2)
library(dplyr)

##
## Attaching package: ‘dplyr’

## The following objects are masked from ‘package:stats’:
##
##     filter, lag

## The following objects are masked from ‘package:base’:
##
##     intersect, setdiff, setequal, union

library(lubridate)

##
## Attaching package: ‘lubridate’

## The following object is masked from ‘package:base’:
##
##     date

library(xts)

## Loading required package: zoo

##
## Attaching package: ‘zoo’

## The following objects are masked from ‘package:base’:
##
##     as.Date, as.Date.numeric

##
## Attaching package: ‘xts’

## The following objects are masked from ‘package:dplyr’:
##
##     first, last

Data

I accessed the data from NOAA (National Oceanic and Atmospheric Administration) So, I Used Clemson_Florence 30 years weather data *You can simply import by native R-Studio import function I used variable FL to name the dataframe

getwd()

## [1] “C:/Users/saura/Desktop”

FL <- read.csv(“C:/Users/saura/Desktop/Florence_30years_Weather_Data.csv”, header = TRUE)
str(FL)

## ‘data.frame’:    10456 obs. of  9 variables:
##  $ NAME     : Factor w/ 1 level “Clemson PDREC”: 1 1 1 1 1 1 1 1 1 1 …
##  $ LATITUDE : num  34.3 34.3 34.3 34.3 34.3 …
##  $ LONGITUDE: num  -79.7 -79.7 -79.7 -79.7 -79.7 …
##  $ ELEVATION: int  125 125 125 125 125 125 125 125 125 125 …
##  $ DATE     : Factor w/ 10456 levels “1/1/1991″,”1/1/1992”,..: 840 782 4155 294 207 517 488 3848 477 3439 …
##  $ PRCP_inch: num  0 0 0 0 0 0 0 0 0 0 …
##  $ Avg_T_F  : Factor w/ 76 levels “.”,”0″,”19″,”20″,..: 4 6 5 3 4 15 5 7 7 9 …
##  $ Max_T_F  : int  32 35 33 26 28 51 31 34 35 39 …
##  $ Min_T_F  : int  8 10 11 12 12 12 13 13 13 13 …

Converting into Date variable or POSIXct format

you can see the Date is in factor format, so before we can use this for analysis, we need to change it to Date variables/POSIXct format. changing the factor varibales of Date into Date variable

FL$DATE <- parse_date_time(FL$DATE, orders = c(“ymd”, “dmy”, “mdy”))
str(FL)

## ‘data.frame’:    10456 obs. of  9 variables:
##  $ NAME     : Factor w/ 1 level “Clemson PDREC”: 1 1 1 1 1 1 1 1 1 1 …
##  $ LATITUDE : num  34.3 34.3 34.3 34.3 34.3 …
##  $ LONGITUDE: num  -79.7 -79.7 -79.7 -79.7 -79.7 …
##  $ ELEVATION: int  125 125 125 125 125 125 125 125 125 125 …
##  $ DATE     : POSIXct, format: “2018-01-07” “2018-01-05” …
##  $ PRCP_inch: num  0 0 0 0 0 0 0 0 0 0 …
##  $ Avg_T_F  : Factor w/ 76 levels “.”,”0″,”19″,”20″,..: 4 6 5 3 4 15 5 7 7 9 …
##  $ Max_T_F  : int  32 35 33 26 28 51 31 34 35 39 …
##  $ Min_T_F  : int  8 10 11 12 12 12 13 13 13 13 …

Now you can see, the DATE has been changed from factor to POSIXct format ###Extracting the year and month from Date variable I used mutate function to store the Year and Month into two new columns in the Data Frame and named the new data frame FL2

FL2 <- FL %>%
  mutate(Year = year(DATE), Month = month(DATE))

plotting the data and doing the analysis

I grouped the data based on year and then did the plotting using mean of maximum temperature per year, I Used default regression model LOESS or polynomial moving average regression with geom_smooth function to see the trend

FL2 %>% group_by(Year) %>% na.omit() %>% summarise(Mean_Max = mean(Max_T_F)) %>%
ggplot(aes(Year, Mean_Max)) + geom_line(type = 2, size = 1) + geom_smooth() +
  scale_x_continuous(breaks = seq(1990, 2019, by = 1)) + xlab(“Year”) + ylab(“Mean Maximum Temperature (°F)”) +
  theme_bw() + theme(axis.text = element_text(size = 12), axis.title = element_text(size = 14))

## Warning: Ignoring unknown parameters: type

## `geom_smooth()` using method = ‘loess’ and formula ‘y ~ x’

You can see the x-axis text are overlapped within each other, you can simply rotate the angle of text to solve the issue, running the codes again with angle function under theme and element text function

FL2 %>% group_by(Year) %>% na.omit() %>% summarise(Mean_Max = mean(Max_T_F)) %>%
ggplot(aes(Year, Mean_Max)) + geom_line(type = 2, size = 1) + geom_smooth() +
  scale_x_continuous(breaks = seq(1990, 2019, by = 1)) + xlab(“Year”) + ylab(“Mean Maximum Temperature (°F)”) +
  theme_bw() + theme(axis.text = element_text(size = 12), axis.title = element_text(size = 14)) +
  theme(axis.text.x = element_text(angle = 45))

## Warning: Ignoring unknown parameters: type

## `geom_smooth()` using method = ‘loess’ and formula ‘y ~ x’

boxplot

Lets plot boxplot to see the more details of the temperature profile

FL2 %>% ggplot(aes(as.factor(Year),Max_T_F)) + geom_boxplot(fill = “light green”) +
  theme_bw() + ylab(“Maximum Temperature (°F)”) + xlab(“Year”) +
  theme(axis.text = element_text(size = 12), axis.title = element_text(size = 14)) + theme(axis.text.x = element_text(angle = 45))

filtering the data for specific months

Lets say I want to filter the data for specific month or for growing season of a particular crop which is from May to September. Now I will filter the month from data set using filter function of dplyr. I can directly even filter the months by extracting months from DATE column using month() function of lubridate or we can use the Month column which we created in the first line of code using mutate function.

FL2 %>% group_by(Year) %>% na.omit() %>% filter(month(DATE) >= 5 & month(DATE) <= 9) %>% summarise(Mean_SeasonalT = mean(Max_T_F)) %>%
  ggplot(aes(Year, Mean_SeasonalT)) + geom_line(size = 0.8) + scale_x_continuous(breaks = seq(1990, 2019, by = 1)) +
  geom_point() + geom_smooth() + ylab(“Mean of Maximum Seasonal Temperature (°F)”) +
  theme(axis.text = element_text(size = 10), axis.title = element_text(size = 12)) +
  theme(axis.text.x = element_text(angle = 45))

## `geom_smooth()` using method = ‘loess’ and formula ‘y ~ x’

Minimum Temperature

Now lets plot the minimum temperature for the Florence over the years. This time I am just using Month Column to filter the data

FL2 %>% group_by(Year) %>% na.omit() %>% filter(Month >= 5 & Month <= 9) %>%
  summarise(Mean_SeasonalT = mean(Min_T_F)) %>%
  ggplot(aes(Year, Mean_SeasonalT)) + geom_line(size = 0.8) +
  scale_x_continuous(breaks = seq(1990, 2019, by = 1)) +
  geom_point() + geom_smooth() + ylab(“Mean of Minimum Seasonal Temperature (°F)”) +
  theme(axis.text = element_text(size = 10), axis.title = element_text(size = 12)) +
  theme(axis.text.x = element_text(angle = 45))

## `geom_smooth()` using method = ‘loess’ and formula ‘y ~ x’

Precipitation

Lets make another line plot for the precipitation per year

FL2 %>% group_by(Year) %>% na.omit() %>% filter(month(DATE) >= 5 & month(DATE) <= 9) %>%
  summarise(Mean_SeasonalT = mean(PRCP_inch)) %>%
  ggplot(aes(Year, Mean_SeasonalT)) + geom_line(size = 0.8) +
  scale_x_continuous(breaks = seq(1990, 2019, by = 1)) +
  geom_point() + geom_smooth() + ylab(“Mean Seasonal Precipitation (inch)”) +
  theme(axis.text = element_text(size = 10), axis.title = element_text(size = 12)) +
  theme(axis.text.x = element_text(angle = 45))

## `geom_smooth()` using method = ‘loess’ and formula ‘y ~ x’

Data for last 10 years

Lets say, Now I only want to put plots for last 10 years to see the changing temperature trend in growing season from May to September. Now I will filter both the Year and Month to get the data. I used facet wrap function to see each year separately.

FL2 %>% na.omit() %>% group_by(Year, Month) %>%
  filter(Month >= 5 & Month <= 9, Year >= 2008 & Year<= 2019) %>%
  summarise(Mean_MaxT = mean(Max_T_F)) %>% ggplot(aes(Month, Mean_MaxT)) +
  geom_line() + geom_point() + facet_wrap(~Year) + ylab(” Mean Maximum Temperature (°F)”)

Now look at the X- axis, the month are coming as numeric value of 5 – 9, but I want to change them in words, so I will use scale_x_continuous function with label arguments to change the same. it will be like,

FL2 %>% na.omit() %>% group_by(Year, Month) %>%
  filter(Month >= 5 & Month <= 9, Year >= 2008 & Year<= 2019) %>%
  summarise(Mean_MaxT = mean(Max_T_F)) %>% ggplot(aes(Month, Mean_MaxT)) +
  geom_line() + geom_point() + facet_wrap(~Year) + ylab(” Mean Maximum Temperature (°F)”) + scale_x_continuous(labels = c(“5” = “May”, “6” =”June”, “7” = “July”, “8” = “August”, “9” = “September”)) + theme(axis.text.x = element_text(angle = 45))

2 thoughts on “Time Series Data Analysis – 1”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s