class: center, middle, inverse, title-slide # Introduction to Time Series ### Dr. Tackett ### 10.30.2018 --- class: regular ## Announcements - HW 5 due Thursday, 11/1 - Lab 7 due Sunday, 11/4 - Project Proposal due 11/13 + Example posters in office --- class: regular ## Packages ```r library(knitr) library(broom) library(dplyr) library(tibble) library(ggplot2) library(cowplot) ``` --- class: middle, center ## Examples of Time Series Data --- class: regular ## Gas Prices <center> <img src="gasprices.png" width="90%" style="display: block; margin: auto;" /> <a href="https://www.gasbuddy.com/Charts" target="_blank">gasbuddy.com</a> </center> --- class: regular ## Global Land-Ocean Temperature Index <center> <img src="GlobalTemp.png" width="90%" style="display: block; margin: auto;" /> <a href="https://climate.nasa.gov/vital-signs/global-temperature/" target="_blank">NASA.gov</a> </center> --- class: middle, center ## Stocks <a href="https://www.google.com/search?rlz=1C5CHFA_enUS812US814&tbm=fin&q=NASDAQ:+AAPL&stick=H4sIAAAAAAAAAONgecRoyi3w8sc9YSmdSWtOXmNU4-IKzsgvd80rySypFJLgYoOy-KR4uLj0c_UNzKtyk8rSeADviEaCOgAAAA&biw=1219&bih=1169#scso=_zp_YW7edDdGxggeTqLqYCw1:0" target="blank">Apple's Stock Price</a> --- class: middle, center ## Popular Music <a href="http://research.google.com/bigpicture/music/#" target="_blank">Google Music Timeline</a> --- class: regular ## Time Series - Regression assumes **<u>independent</u>** errors across observations - When data is ordered over time, errors in one period may influence error in another period + Called <font class="vocab">time series data</font> - Assume observations measured at equally spaced time points - We will do a brief introduction to time series analysis + Take *STA 444: Statistical Modeling of Spatial and Time Series Data* for more in-depth study of the subject --- class: regular ## Example: Detecting Melanoma - Incidence of melanoma (skin cancer) is related to solar radiation - <font class="vocab">Question: </font> Is there evidence that melanoma incidence is related to sunspot activity in the same year or to sunspot activity in the previous year? - <font class="vocab">Data: </font> Age-adjusted melanoma incidence among males from the Connecticut Tumor Registry 1936 - 1972 and annual sunspot activity + `ex1514` data in `Sleuth3` package --- class: regular ## Example: Detecting Melanoma - <font class="vocab">`Year`: </font> 1936- 1972 - <font class="vocab">`Melanoma`: </font> Age-adjusted melanoma incidence per 100,000 males - <font class="vocab">`Sunspot`: </font> Measure of annual sunspot activity --- class: regular ## Example: Detecting Melanoma ```r library(Sleuth3) cancer_sun <- ex1514 glimpse(cancer_sun) ``` ``` ## Observations: 37 ## Variables: 3 ## $ Year <int> 1936, 1937, 1938, 1939, 1940, 1941, 1942, 1943, 1944,... ## $ Melanoma <dbl> 1.0, 0.9, 0.8, 1.4, 1.2, 1.0, 1.5, 1.9, 1.5, 1.5, 1.5... ## $ Sunspot <int> 40, 115, 100, 80, 60, 40, 23, 10, 10, 25, 75, 145, 13... ``` --- class: regular ## Data Exploration <img src="15_time_series_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> --- ## Data Exploration <img src="15_time_series_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> --- ## Melanoma vs. Sunspot ```r model <- lm(Melanoma ~ Sunspot, data=cancer_sun) kable(tidy(model),format="markdown",digits=3) ``` |term | estimate| std.error| statistic| p.value| |:-----------|--------:|---------:|---------:|-------:| |(Intercept) | 2.522| 0.371| 6.796| 0.000| |Sunspot | 0.003| 0.004| 0.715| 0.479| ```r glance(model)$r.squared ``` ``` ## [1] 0.01439734 ``` --- ## Residuals vs. Sunspot <img src="15_time_series_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> <center> <font class="vocab3">No concerning pattern</font> </center> --- class: regular ## Residuals vs. Year <img src="15_time_series_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> <center> <font class="vocab3">serial correlation!</font> </center> --- class: regular ## Measuring of Serial Correlation - We can compute the correlation in residuals at time `\(t\)` and time `\(t-k\)` - `\(k\)`: how far back you wish to go + This is called the <font class="vocab3">lag</font> - *Example*: <font class="vocab3">Lag <i>k</i> autocorrelation coefficient: </font> `$$r_k = \frac{c_k}{c_0} = \frac{\sum\limits_{t=1+k}^n (\text{res}_t \times \text{res}_{t-k})/(n-1)}{\sum\limits_{t=1}^n \text{res}^2_t / (n-1)}$$` --- class: regular ## Autocorrelation - We will focus on *Lag 1* autocorrelation, i.e. `\(k=1\)` `$$r_1 = \frac{c_1}{c_0} = \frac{\sum\limits_{t=2}^n (\text{res}_t \times \text{res}_{t-1})/(n-1)}{\sum\limits_{t=1}^n \text{res}^2_t / (n-1)}$$` - `\(c_1\)` and `\(c_0\)` are estimates of <font class="vocab3"> autocovariance: </font>the covariance between the response variable and itself at two time points - `\(r_1\)` is an estimate of the <font class="vocab3"> autocorrelation: </font> the correlation between residuals at time `\(t\)` and time `\(t-1\)` + `\(-1 \leq r_1 \leq 1\)` --- class: regular ### Melanoma vs. Sunspots - Autocorrelation <table> <thead> <tr> <th style="text-align:right;"> Year </th> <th style="text-align:right;"> resid_lag1 </th> <th style="text-align:right;"> resid_current </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1937 </td> <td style="text-align:right;"> -1.644 </td> <td style="text-align:right;"> -1.974 </td> </tr> <tr> <td style="text-align:right;"> 1938 </td> <td style="text-align:right;"> -1.974 </td> <td style="text-align:right;"> -2.028 </td> </tr> <tr> <td style="text-align:right;"> 1939 </td> <td style="text-align:right;"> -2.028 </td> <td style="text-align:right;"> -1.367 </td> </tr> <tr> <td style="text-align:right;"> 1940 </td> <td style="text-align:right;"> -1.367 </td> <td style="text-align:right;"> -1.506 </td> </tr> <tr> <td style="text-align:right;"> 1941 </td> <td style="text-align:right;"> -1.506 </td> <td style="text-align:right;"> -1.644 </td> </tr> <tr> <td style="text-align:right;"> 1942 </td> <td style="text-align:right;"> -1.644 </td> <td style="text-align:right;"> -1.093 </td> </tr> </tbody> </table> ```r #calculate autocorrelation sum(resid_current * resid_lag1) / sum(cancer_sun$Residuals^2) ``` ``` ## [1] 0.8597797 ``` <font class="vocab3">What is one way to account for the year-to-year impact?</font> --- class: regular ## Melanova vs. Sunspots & Year ```r # Add Year to the model model_v2 <- lm(Melanoma ~ Sunspot + Year, data=cancer_sun) kable(tidy(model_v2),format="html",digits=3) ``` <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> -225.108 </td> <td style="text-align:right;"> 13.257 </td> <td style="text-align:right;"> -16.981 </td> <td style="text-align:right;"> 0.00 </td> </tr> <tr> <td style="text-align:left;"> Sunspot </td> <td style="text-align:right;"> 0.001 </td> <td style="text-align:right;"> 0.001 </td> <td style="text-align:right;"> 1.074 </td> <td style="text-align:right;"> 0.29 </td> </tr> <tr> <td style="text-align:left;"> Year </td> <td style="text-align:right;"> 0.117 </td> <td style="text-align:right;"> 0.007 </td> <td style="text-align:right;"> 17.172 </td> <td style="text-align:right;"> 0.00 </td> </tr> </tbody> </table> ```r glance(model_v2)$r.squared ``` ``` ## [1] 0.8981024 ``` --- class: regular ## Melanova vs. Sunspots & Year <img src="15_time_series_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> --- class: regular ## Autocorrelation <img src="15_time_series_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> ``` ## [1] "autocorrelation: 0.377" ``` --- class: regular ## Autoregressive Model - There are many models that account for serial correlation in the error terms - Common model is the <font class="vocab2">autoregressive (AR) model</font> - If we have no explanatory variables, the AR model with one lag (<font class="vocab3">AR(1) model</font>) is `$$Y_t = \beta_0 + \beta_1 Y_{t-1} + \epsilon_t \hspace{10mm} \epsilon_t \sim N(0,\sigma^2)$$` --- class: regular ### Example: AR(1) Model <table> <thead> <tr> <th style="text-align:right;"> Year </th> <th style="text-align:right;"> Melanoma </th> <th style="text-align:right;"> melanoma_prev </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1937 </td> <td style="text-align:right;"> 0.9 </td> <td style="text-align:right;"> 1.0 </td> </tr> <tr> <td style="text-align:right;"> 1938 </td> <td style="text-align:right;"> 0.8 </td> <td style="text-align:right;"> 0.9 </td> </tr> <tr> <td style="text-align:right;"> 1939 </td> <td style="text-align:right;"> 1.4 </td> <td style="text-align:right;"> 0.8 </td> </tr> <tr> <td style="text-align:right;"> 1940 </td> <td style="text-align:right;"> 1.2 </td> <td style="text-align:right;"> 1.4 </td> </tr> <tr> <td style="text-align:right;"> 1941 </td> <td style="text-align:right;"> 1.0 </td> <td style="text-align:right;"> 1.2 </td> </tr> <tr> <td style="text-align:right;"> 1942 </td> <td style="text-align:right;"> 1.5 </td> <td style="text-align:right;"> 1.0 </td> </tr> </tbody> </table> --- class: regular ## Example: AR(1) Model ```r model_ar1 <- lm(Melanoma ~ melanoma_prev,data=melanoma_data) kable(tidy(model_ar1),format="html",digits=3) ``` <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 0.216 </td> <td style="text-align:right;"> 0.186 </td> <td style="text-align:right;"> 1.164 </td> <td style="text-align:right;"> 0.253 </td> </tr> <tr> <td style="text-align:left;"> melanoma_prev </td> <td style="text-align:right;"> 0.964 </td> <td style="text-align:right;"> 0.063 </td> <td style="text-align:right;"> 15.317 </td> <td style="text-align:right;"> 0.000 </td> </tr> </tbody> </table> ```r glance(model_ar1)$r.squared ``` ``` ## [1] 0.8734264 ``` --- class: regular ## Residual Plots <img src="15_time_series_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" /> ``` ## [1] "autocorrelation: -0.145" ``` --- class: regular ## Next Steps: - We created a model that explains a lot of the variation in melanoma `\((\approx 87\%)\)` - We made a big improvement in terms of reducing serial correlation `\((\approx 0.86 \text{ to } \approx -0.15)\)` - However, the model we created isn't ideal...**why not?** --- class: regular ### AR(1) Model: One Explanatory variable - If we want to use an explanatory variable, the **AR(1)** model takes the general form: <br> `$$\begin{aligned}&Y_t = \beta_0 + \beta_1 X_t + \epsilon_t\\ &\epsilon_t = \alpha \epsilon_{t-1} + \delta_t, \hspace{10mm} \delta_t \sim N(0,\sigma^2)\\\end{aligned}$$` <br> - `\(\alpha\)` is the autocorrelation + We can estimate `\(\alpha\)` using `\(r_1\)` --- ### Example: Use sunspot from previous year <table> <thead> <tr> <th style="text-align:right;"> Sunspot </th> <th style="text-align:right;"> sunspot_prev </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 115 </td> <td style="text-align:right;"> 40 </td> </tr> <tr> <td style="text-align:right;"> 100 </td> <td style="text-align:right;"> 115 </td> </tr> <tr> <td style="text-align:right;"> 80 </td> <td style="text-align:right;"> 100 </td> </tr> <tr> <td style="text-align:right;"> 60 </td> <td style="text-align:right;"> 80 </td> </tr> <tr> <td style="text-align:right;"> 40 </td> <td style="text-align:right;"> 60 </td> </tr> <tr> <td style="text-align:right;"> 23 </td> <td style="text-align:right;"> 40 </td> </tr> </tbody> </table> --- class: regular ### Example: Melanoma vs. sunspot_prev ```r model_lag1 <- lm(Melanoma ~ sunspot_prev,data=lag1_data) kable(tidy(model_lag1),format="html",digits=3) ``` <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 2.398 </td> <td style="text-align:right;"> 0.363 </td> <td style="text-align:right;"> 6.604 </td> <td style="text-align:right;"> 0.000 </td> </tr> <tr> <td style="text-align:left;"> sunspot_prev </td> <td style="text-align:right;"> 0.006 </td> <td style="text-align:right;"> 0.004 </td> <td style="text-align:right;"> 1.330 </td> <td style="text-align:right;"> 0.192 </td> </tr> </tbody> </table> ```r glance(model_lag1)$r.squared ``` ``` ## [1] 0.04943778 ``` --- class: regular ### Example: Melanoma vs. sunspot_prev <img src="15_time_series_files/figure-html/unnamed-chunk-24-1.png" style="display: block; margin: auto;" /> ``` ## [1] "autocorrelation: 0.841" ``` --- class: regular ## Example: Add Year to the Model ```r # Control for year model_lag1_v2 <- lm(Melanoma ~ sunspot_prev + Year,data=lag1_data) kable(tidy(model_lag1_v2),format="html",digits=3) ``` <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> -226.916 </td> <td style="text-align:right;"> 12.445 </td> <td style="text-align:right;"> -18.234 </td> <td style="text-align:right;"> 0.000 </td> </tr> <tr> <td style="text-align:left;"> sunspot_prev </td> <td style="text-align:right;"> 0.004 </td> <td style="text-align:right;"> 0.001 </td> <td style="text-align:right;"> 3.065 </td> <td style="text-align:right;"> 0.004 </td> </tr> <tr> <td style="text-align:left;"> Year </td> <td style="text-align:right;"> 0.117 </td> <td style="text-align:right;"> 0.006 </td> <td style="text-align:right;"> 18.428 </td> <td style="text-align:right;"> 0.000 </td> </tr> </tbody> </table> ```r glance(model_lag1_v2)$r.squared ``` ``` ## [1] 0.9158057 ``` --- class: regular ## Residual Plots <img src="15_time_series_files/figure-html/unnamed-chunk-27-1.png" style="display: block; margin: auto;" /> ``` ## [1] "autocorrelation: 0.165" ``` --- ### Practice: Lynx Trappings vs. Sunspots - `ex1515` in `Sleuth3` package + <font class="vocab">`Year`</font>: 1821 - 1934 + <font class="vocab">`Lynx`</font>: Number of lynx trapped + <font class="vocab">`Sunspot`</font>: Measure of sunspot activity - <font class="vocab">`Question`</font>: Is there evidence that the number of lynx trapped are related to sunspot activity?