1 Inštalácia knižníc v R-ku

Začneme s inštaláciou knižníc potrebných na toto cvičenie. Spustením nasledovného príkazu sa nainštalujú tie z knižníc, ktoré nainštalované nie sú, a zároveň sa aj načítajú.

# názvy balíkov, ktoré budú potrebné pre cvičenie 1
packages <- c("astsa", "urca", "datasets", "ggplot2", "forecast")

# inštalácia balíkov, ak nie sú nainštalované
installed_packages <- packages %in% rownames(installed.packages())
if (any(installed_packages == FALSE)) {
  install.packages(packages[!installed_packages],dependencies=TRUE)
}

# načítanie balíkov
invisible(lapply(packages, library, character.only = TRUE))

2 Dáta na samostatnú prácu

Zadanie a otázky sú vždy rovnaké ako v modelovom príklade.

2.1 Dáta BJsales z knižnice datasets

Popis dát z helpu: The sales time series BJsales and leading indicator BJsales.lead each contain 150 observations.

y <- window(BJsales, end=130)
plot(y)

Vidíme, že v dátach je trend, takže ich určite treba aspoň jedenkrát diferencovať.

plot(diff(y))

Testujme prítomnosť jednotkového koreňa v diferenciách. Keďže v pôvodných dátach bol trend, v teste jednotkového koreňa pre diferencie volíme type=“drift”. Model pre diferencie teda bude obsahovať konštantu. Konštantný člen v modeli pre diferencie totiž znamená lineárny trend v pôvodných dátach a tento trend sa dostane aj do predikcií.

summary(ur.df(diff(y), lags = 5, selectlags = "BIC", type = "drift"))
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.3193 -0.9485 -0.0214  0.7786  3.4804 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.26137    0.13789   1.895   0.0604 .  
## z.lag.1     -0.54178    0.10373  -5.223  7.5e-07 ***
## z.diff.lag  -0.20066    0.08856  -2.266   0.0253 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.43 on 120 degrees of freedom
## Multiple R-squared:  0.3702, Adjusted R-squared:  0.3597 
## F-statistic: 35.27 on 2 and 120 DF,  p-value: 8.962e-13
## 
## 
## Value of test-statistic is: -5.2231 13.6411 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.46 -2.88 -2.57
## phi1  6.52  4.63  3.81

Testová štatistika (-5.2231) je menšia ako kritická hodnota (-2.88), preto sa hypotéza o jednotkovom koreni zamieta. Dáta ďalej netreba diferencovať. Diferencie sú stacionárne.

Ideme hľadať ARMA model pre diferencie:

acf2(diff(y))

##      [,1] [,2] [,3] [,4] [,5] [,6]  [,7] [,8]  [,9] [,10] [,11] [,12] [,13]
## ACF  0.32 0.28 0.25 0.27 0.16 0.14  0.06 0.14 -0.03  0.00  0.08 -0.02 -0.05
## PACF 0.32 0.20 0.14 0.14 0.00 0.00 -0.07 0.08 -0.12 -0.02  0.11 -0.07 -0.04
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22]
## ACF  -0.08 -0.09  0.03 -0.01 -0.06  0.03  0.11 -0.01  0.00
## PACF -0.07 -0.06  0.11  0.05 -0.04  0.06  0.15 -0.09 -0.06

Tipy: Buď AR(2) alebo MA(4).

ar2 <- sarima(y, 2,1,0)  # rezidua OK, BIC = 3.661291 
## initial  value 0.415208 
## iter   2 value 0.349355
## iter   3 value 0.341581
## iter   4 value 0.341492
## iter   5 value 0.341491
## iter   6 value 0.341491
## iter   7 value 0.341491
## iter   8 value 0.341491
## iter   8 value 0.341491
## final  value 0.341491 
## converged
## initial  value 0.336421 
## iter   2 value 0.336391
## iter   3 value 0.336362
## iter   4 value 0.336361
## iter   5 value 0.336361
## iter   5 value 0.336361
## iter   5 value 0.336361
## final  value 0.336361 
## converged
## <><><><><><><><><><><><><><>
##  
## Coefficients: 
##          Estimate     SE t.value p.value
## ar1        0.2510 0.0858  2.9259  0.0041
## ar2        0.2022 0.0858  2.3572  0.0200
## constant   0.4292 0.2232  1.9229  0.0568
## 
## sigma^2 estimated as 1.956716 on 126 degrees of freedom 
##  
## AIC = 3.572615  AICc = 3.574103  BIC = 3.661291 
## 

ma4 <- sarima(y, 0,1,4)  #rezidua OK, BIC = 3.723177 , 
## initial  value 0.409765 
## iter   2 value 0.348396
## iter   3 value 0.334337
## iter   4 value 0.330026
## iter   5 value 0.329306
## iter   6 value 0.329274
## iter   7 value 0.329273
## iter   8 value 0.329272
## iter   9 value 0.329272
## iter  10 value 0.329272
## iter  10 value 0.329272
## iter  10 value 0.329272
## final  value 0.329272 
## converged
## initial  value 0.329639 
## iter   2 value 0.329633
## iter   3 value 0.329631
## iter   4 value 0.329631
## iter   5 value 0.329631
## iter   6 value 0.329631
## iter   6 value 0.329631
## iter   6 value 0.329631
## final  value 0.329631 
## converged
## <><><><><><><><><><><><><><>
##  
## Coefficients: 
##          Estimate     SE t.value p.value
## ma1        0.2059 0.0897  2.2959  0.0234
## ma2        0.1570 0.0917  1.7126  0.0893
## ma3        0.1584 0.0771  2.0536  0.0421
## ma4        0.1665 0.0792  2.1019  0.0376
## constant   0.4320 0.2049  2.1083  0.0370
## 
## sigma^2 estimated as 1.930203 on 124 degrees of freedom 
##  
## AIC = 3.590162  AICc = 3.593944  BIC = 3.723177 
## 

AR(2) má menšie BIC. Overíme ešte stacionaritu.

Všeobecný postup:

koef <- ar2$fit$coef
koef_ar <- koef[grep("^ar",names(koef))]

Absolútne hodnoty koreňov polynómu potom sú

abs(polyroot(c(1,-koef_ar)))
## [1] 1.688394 2.929791

Obe sú väčšie ako 1, proces je stacionárny (aj invertovateľný lebo je to AR).

Predikcie:

sarima.for(y, 20,2,1,0)
## $pred
## Time Series:
## Start = 131 
## End = 150 
## Frequency = 1 
##  [1] 257.4086 257.6503 257.9677 258.3309 258.7210 259.1270 259.5425 259.9636
##  [9] 260.3880 260.8144 261.2419 261.6701 262.0988 262.5277 262.9567 263.3858
## [17] 263.8149 264.2441 264.6733 265.1025
## 
## $se
## Time Series:
## Start = 131 
## End = 150 
## Frequency = 1 
##  [1]  1.398827  2.240262  3.084861  3.838836  4.527822  5.154490  5.729799
##  [8]  6.261398  6.756340  7.220211  7.657601  8.072206  8.467030  8.844530
## [15]  9.206724  9.555292  9.891636 10.216944 10.532227 10.838352
lines(window(BJsales, start=131), col="blue")

2.2 Dáta WWWusage z knižnice datasets

Popis dát z helpu: A time series of the numbers of users connected to the Internet through a server every minute.

y <- window(WWWusage, start = 1, end = 90)
plot(y)

V dátach nie je lineárny trend. Testujme prítomnosť jednotkového koreňa. Stredná hodnota je niekde okolo 140, preto volíme type = “drift”.

summary(ur.df(y, lags = 5, selectlags = "BIC", type = "drift"))
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.6658 -2.0290 -0.1395  1.7978  6.7602 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.29086    1.57387   2.726  0.00789 ** 
## z.lag.1     -0.03143    0.01192  -2.637  0.01006 *  
## z.diff.lag1  1.09678    0.10462  10.483  < 2e-16 ***
## z.diff.lag2 -0.57526    0.14746  -3.901  0.00020 ***
## z.diff.lag3  0.35540    0.10656   3.335  0.00130 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.076 on 79 degrees of freedom
## Multiple R-squared:  0.7334, Adjusted R-squared:  0.7199 
## F-statistic: 54.33 on 4 and 79 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -2.6373 3.7222 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.51 -2.89 -2.58
## phi1  6.70  4.71  3.86

Testová štatistika (-2.6373) je väčšia ako kritická hodnota (-2.89), preto sa hypotéza o jednotkovom koreni nezamieta. Dáta treba diferencovať.

plot(diff(y))

Keďže v pôvodných dátach nebol trend, v teste jednotkového koreňa pre diferencie volíme type=“none”. Model pre diferencie potom nebude obsahovať konštantu. Konštantný člen v modeli pre diferencie by totiž znamenal lineárny trend v pôvodných dátach a tento trend by sa dostal aj do predikcií, čo nechceme.

summary(ur.df(diff(y), lags = 5, selectlags = "BIC", type = "none"))
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.2924 -1.4973 -0.1364  2.2677  7.3313 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## z.lag.1     -0.15977    0.06698  -2.386  0.01942 * 
## z.diff.lag1  0.30561    0.10238   2.985  0.00376 **
## z.diff.lag2 -0.30490    0.10703  -2.849  0.00558 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.17 on 80 degrees of freedom
## Multiple R-squared:  0.2411, Adjusted R-squared:  0.2126 
## F-statistic: 8.472 on 3 and 80 DF,  p-value: 5.914e-05
## 
## 
## Value of test-statistic is: -2.3856 
## 
## Critical values for test statistics: 
##      1pct  5pct 10pct
## tau1 -2.6 -1.95 -1.61

Testová štatistika (-2.3856) je menšia ako kritická hodnota (-1.95), preto sa hypotéza o jednotkovom koreni zamieta. Dáta ďalej netreba diferencovať, sú stacionárne.

Ideme hľadať ARMA model:

acf2(diff(y))

##      [,1]  [,2] [,3]  [,4]  [,5]  [,6]  [,7]  [,8] [,9] [,10] [,11] [,12] [,13]
## ACF   0.8  0.55 0.43  0.37  0.29  0.19  0.10  0.02 0.01  0.03  0.04 -0.03 -0.14
## PACF  0.8 -0.24 0.24 -0.03 -0.06 -0.04 -0.08 -0.03 0.08  0.03  0.02 -0.18 -0.15
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20]
## ACF  -0.22 -0.24 -0.22 -0.24 -0.31 -0.36 -0.37
## PACF -0.08 -0.02  0.07 -0.11 -0.09 -0.10 -0.08

Poďla PACF to vyzerá na AR(3). Vyskúšame aj zmiešaný ARMA model.

ar3 <- sarima(y, 3,1,0, no.constant = TRUE) # rezidua OK, BIC = 5.308413 
## initial  value 1.761982 
## iter   2 value 1.534797
## iter   3 value 1.378519
## iter   4 value 1.251812
## iter   5 value 1.184973
## iter   6 value 1.143419
## iter   7 value 1.129477
## iter   8 value 1.128519
## iter   9 value 1.128517
## iter  10 value 1.128516
## iter  11 value 1.128515
## iter  12 value 1.128515
## iter  13 value 1.128515
## iter  13 value 1.128515
## iter  13 value 1.128515
## final  value 1.128515 
## converged
## initial  value 1.134577 
## iter   2 value 1.134475
## iter   3 value 1.134420
## iter   4 value 1.134400
## iter   5 value 1.134400
## iter   5 value 1.134400
## iter   5 value 1.134400
## final  value 1.134400 
## converged
## <><><><><><><><><><><><><><>
##  
## Coefficients: 
##     Estimate     SE t.value p.value
## ar1   1.1193 0.1028 10.8882  0.0000
## ar2  -0.5846 0.1460 -4.0041  0.0001
## ar3   0.2945 0.1019  2.8902  0.0049
## 
## sigma^2 estimated as 9.50564 on 86 degrees of freedom 
##  
## AIC = 5.196565  AICc = 5.199737  BIC = 5.308413 
## 

arma11 <- sarima(y, 1,1,1, no.constant = TRUE) # rezidua OK, BIC = 5.285761 
## initial  value 1.750658 
## iter   2 value 1.342460
## iter   3 value 1.178228
## iter   4 value 1.166667
## iter   5 value 1.146411
## iter   6 value 1.143394
## iter   7 value 1.142611
## iter   8 value 1.142174
## iter   9 value 1.142169
## iter  10 value 1.142169
## iter  10 value 1.142169
## iter  10 value 1.142169
## final  value 1.142169 
## converged
## initial  value 1.148300 
## iter   2 value 1.148294
## iter   3 value 1.148292
## iter   4 value 1.148291
## iter   4 value 1.148291
## iter   4 value 1.148291
## final  value 1.148291 
## converged
## <><><><><><><><><><><><><><>
##  
## Coefficients: 
##     Estimate     SE t.value p.value
## ar1   0.6652 0.0893  7.4535       0
## ma1   0.4870 0.1022  4.7656       0
## 
## sigma^2 estimated as 9.783222 on 87 degrees of freedom 
##  
## AIC = 5.201874  AICc = 5.203442  BIC = 5.285761 
## 

Menšie BIC má ARMA(1,1) pre prvé diferencie.

Overíme ešte stacionaritu a invertovateľnosť.

Stacionarita:

koef <- arma11$fit$coef
koef_ar <- koef[grep("^ar",names(koef))]

Absolútna hodnota koreňa polynómu potom je

abs(polyroot(c(1,-koef_ar)))
## [1] 1.503241

Teda viac ako 1, proces je stacionárny.

Invertovateľnosť:

koef_ma <- koef[grep("^ma",names(koef))]

Absolútna hodnota koreňa polynómu potom je

abs(polyroot(c(1,koef_ma)))
## [1] 2.053574

Teda viac ako 1, proces je invertovateľný.

Predikcie:

sarima.for(y, 10,1,1,1, no.constant = TRUE)
## $pred
## Time Series:
## Start = 91 
## End = 100 
## Frequency = 1 
##  [1] 187.3193 190.8578 193.2117 194.7776 195.8193 196.5123 196.9732 197.2799
##  [9] 197.4839 197.6196
## 
## $se
## Time Series:
## Start = 91 
## End = 100 
## Frequency = 1 
##  [1]  3.127814  7.422812 11.765919 15.919698 19.806972 23.416397 26.763551
##  [8] 29.873741 32.774245 35.490944
lines(window(WWWusage, start=91), col="blue")

2.3 Dáta faithful z knižnice datasets

Popis dát z helpu: Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA.

y <- ts(faithful[, 1])
y <- window(y, end=250)
plot(y)

V dátach nie je trend. Stredná hodnota niekde okolo 3.5 => type=“drift”.

summary(ur.df(y, lags = 5, selectlags = "BIC", type = "drift"))
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.7041 -0.7983  0.1068  0.7382  1.9661 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.55920    0.40299  13.795   <2e-16 ***
## z.lag.1     -1.59387    0.11419 -13.958   <2e-16 ***
## z.diff.lag   0.01476    0.06449   0.229    0.819    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9478 on 241 degrees of freedom
## Multiple R-squared:  0.7851, Adjusted R-squared:  0.7833 
## F-statistic: 440.3 on 2 and 241 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -13.958 97.4137 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.46 -2.88 -2.57
## phi1  6.52  4.63  3.81

Testová štatistika (-13.958) je menšia ako kritická hodnota (-2.88), preto sa hypotéza o jednotkovom koreni zamieta. Dáta ďalej netreba diferencovať.

Ideme hľadať sarima model:

acf2(y)

##       [,1] [,2]  [,3]  [,4]  [,5]  [,6]  [,7] [,8]  [,9] [,10] [,11] [,12]
## ACF  -0.57 0.32 -0.09 -0.02  0.05 -0.06  0.01 0.01 -0.05 -0.01  0.04 -0.06
## PACF -0.57 0.00  0.14 -0.03 -0.02 -0.04 -0.05 0.01 -0.03 -0.08  0.03  0.01
##      [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF   0.04  0.01 -0.03  0.02  0.02 -0.06  0.13 -0.12  0.10 -0.06  0.02 -0.04
## PACF -0.02  0.03 -0.01 -0.04  0.04 -0.05  0.08  0.01  0.01  0.00  0.00 -0.07
##      [,25] [,26]
## ACF  -0.01  0.06
## PACF -0.07  0.08

Buď MA(2) alebo AR(1) (možno AR(3)?).

ma2 <- sarima(y, 0,0,2)  # rezidua OK (prve na hrane), BIC = 2.80667 
## initial  value 0.133829 
## iter   2 value -0.058892
## iter   3 value -0.059848
## iter   4 value -0.060042
## iter   5 value -0.060042
## iter   6 value -0.060042
## iter   6 value -0.060042
## iter   6 value -0.060042
## final  value -0.060042 
## converged
## initial  value -0.059769 
## iter   2 value -0.059774
## iter   3 value -0.059775
## iter   3 value -0.059775
## iter   3 value -0.059775
## final  value -0.059775 
## converged
## <><><><><><><><><><><><><><>
##  
## Coefficients: 
##       Estimate     SE t.value p.value
## ma1    -0.5257 0.0587 -8.9525       0
## ma2     0.3327 0.0582  5.7190       0
## xmean   3.4764 0.0480 72.4058       0
## 
## sigma^2 estimated as 0.8858874 on 247 degrees of freedom 
##  
## AIC = 2.750327  AICc = 2.750717  BIC = 2.80667 
## 

ar1 <- sarima(y, 1,0,0)  # rezidua OK, BIC = 2.787634 
## initial  value 0.135811 
## iter   2 value -0.057035
## iter   3 value -0.057038
## iter   4 value -0.057038
## iter   4 value -0.057038
## iter   4 value -0.057038
## final  value -0.057038 
## converged
## initial  value -0.058246 
## iter   2 value -0.058250
## iter   3 value -0.058250
## iter   4 value -0.058250
## iter   4 value -0.058250
## iter   4 value -0.058250
## final  value -0.058250 
## converged
## <><><><><><><><><><><><><><>
##  
## Coefficients: 
##       Estimate     SE  t.value p.value
## ar1    -0.5641 0.0520 -10.8394       0
## xmean   3.4785 0.0382  91.1252       0
## 
## sigma^2 estimated as 0.8886673 on 248 degrees of freedom 
##  
## AIC = 2.745377  AICc = 2.745571  BIC = 2.787634 
## 

ar3 <- sarima(y, 3,0,0)  # rezidua OK, BIC = 2.813068 , nesignifikantny ar2 koeficient
## initial  value 0.135472 
## iter   2 value 0.044159
## iter   3 value -0.069498
## iter   4 value -0.070881
## iter   5 value -0.071163
## iter   6 value -0.071196
## iter   7 value -0.071196
## iter   8 value -0.071196
## iter   8 value -0.071196
## iter   8 value -0.071196
## final  value -0.071196 
## converged
## initial  value -0.067479 
## iter   2 value -0.067612
## iter   3 value -0.067619
## iter   4 value -0.067619
## iter   4 value -0.067619
## iter   4 value -0.067619
## final  value -0.067619 
## converged
## <><><><><><><><><><><><><><>
##  
## Coefficients: 
##       Estimate     SE t.value p.value
## ar1    -0.5641 0.0626 -9.0135  0.0000
## ar2     0.0772 0.0725  1.0648  0.2880
## ar3     0.1370 0.0630  2.1754  0.0306
## xmean   3.4773 0.0438 79.4660  0.0000
## 
## sigma^2 estimated as 0.8719722 on 246 degrees of freedom 
##  
## AIC = 2.742639  AICc = 2.743292  BIC = 2.813068 
## 

AR(1) má najmenšie BIC. Overíme ešte stacionaritu.

koef <- ar1$fit$coef
koef_ar <- koef[grep("^ar",names(koef))]

Absolútne hodnoty koreňov polynómu potom sú

abs(polyroot(c(1,-koef_ar)))
## [1] 1.772661

Je stacionárny (aj inverovateľný - lebo je to AR).

Predikcie:

sarima.for(y, 22, 1,0,0)
## $pred
## Time Series:
## Start = 251 
## End = 272 
## Frequency = 1 
##  [1] 2.986922 3.755866 3.322087 3.566792 3.428748 3.506622 3.462691 3.487474
##  [9] 3.473493 3.481380 3.476931 3.479441 3.478025 3.478824 3.478373 3.478627
## [17] 3.478484 3.478565 3.478519 3.478545 3.478530 3.478538
## 
## $se
## Time Series:
## Start = 251 
## End = 272 
## Frequency = 1 
##  [1] 0.9426915 1.0823459 1.1231524 1.1358310 1.1398362 1.1411079 1.1415123
##  [8] 1.1416409 1.1416819 1.1416949 1.1416990 1.1417004 1.1417008 1.1417009
## [15] 1.1417010 1.1417010 1.1417010 1.1417010 1.1417010 1.1417010 1.1417010
## [22] 1.1417010
lines(window(ts(faithful[, 1]), start=251), col="blue")

2.4 Dáta blood z knižnice astsa

Popis dát z helpu: Multiple time series of measurements made for 91 days on the three variables, log(white blood count) [WBC], log(platelet) [PLT] and hematocrit [HCT]. Missing data code is NA.

y <- ts(blood[1:36, 3])
plot(y)

V dátach nie je trend. Stredná hodnota niekde okolo 30 => type=“drift”

summary(ur.df(y, lags = 5, selectlags = "BIC", type = "drift"))
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.7026 -0.5769  0.3090  1.0144  4.5225 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  19.1631     5.8934   3.252  0.00307 **
## z.lag.1      -0.6117     0.1876  -3.260  0.00301 **
## z.diff.lag    0.2079     0.1842   1.129  0.26889   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.096 on 27 degrees of freedom
## Multiple R-squared:  0.2877, Adjusted R-squared:  0.235 
## F-statistic: 5.454 on 2 and 27 DF,  p-value: 0.01025
## 
## 
## Value of test-statistic is: -3.2597 5.3131 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.58 -2.93 -2.60
## phi1  7.06  4.86  3.94

Testová štatistika (-3.2597) je menšia ako kritická hodnota (-2.93), preto sa hypotéza o jednotkovom koreni zamieta. Dáta ďalej netreba diferencovať.

acf2(y)

##      [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9]
## ACF  0.45  0.04 -0.05 -0.03 -0.14 -0.11 -0.24 -0.28 -0.19
## PACF 0.45 -0.21  0.03 -0.02 -0.16  0.04 -0.29 -0.09 -0.05

AR(1) alebo MA(1).

ar1 <- sarima(y,1,0,0) # BIC = 4.589554 
## initial  value 0.849044 
## iter   2 value 0.732591
## iter   3 value 0.732571
## iter   4 value 0.732571
## iter   4 value 0.732571
## iter   4 value 0.732571
## final  value 0.732571 
## converged
## initial  value 0.726939 
## iter   2 value 0.726534
## iter   3 value 0.726525
## iter   4 value 0.726525
## iter   4 value 0.726525
## iter   4 value 0.726525
## final  value 0.726525 
## converged
## <><><><><><><><><><><><><><>
##  
## Coefficients: 
##       Estimate     SE t.value p.value
## ar1     0.4457 0.1462  3.0484  0.0044
## xmean  31.3191 0.6065 51.6374  0.0000
## 
## sigma^2 estimated as 4.249914 on 34 degrees of freedom 
##  
## AIC = 4.457594  AICc = 4.467695  BIC = 4.589554 
## 

ma1 <- sarima(y,0,0,1)    # BIC = 4.562321 
## initial  value 0.839586 
## iter   2 value 0.712250
## iter   3 value 0.712236
## iter   4 value 0.711713
## iter   5 value 0.711710
## iter   6 value 0.711709
## iter   6 value 0.711709
## final  value 0.711709 
## converged
## initial  value 0.712931 
## iter   2 value 0.712922
## iter   3 value 0.712909
## iter   3 value 0.712909
## iter   3 value 0.712909
## final  value 0.712909 
## converged
## <><><><><><><><><><><><><><>
##  
## Coefficients: 
##       Estimate     SE t.value p.value
## ma1      0.478 0.1240  3.8551   5e-04
## xmean   31.319 0.4963 63.0993   0e+00
## 
## sigma^2 estimated as 4.131375 on 34 degrees of freedom 
##  
## AIC = 4.430361  AICc = 4.440462  BIC = 4.562321 
## 

MA(1) má menšie BIC.

Overíme ešte invertovateľnosť.

koef <- ma1$fit$coef
koef_ma <- koef[grep("^ma",names(koef))]

Absolútne hodnoty koreňom polynómu potom sú

abs(polyroot(c(1,koef_ma)))
## [1] 2.09189

Je inverovateľný (aj stacionárny - lebo je to MA).

Predikcie:

sarima.for(y, 5, 0,0,1)

## $pred
## Time Series:
## Start = 37 
## End = 41 
## Frequency = 1 
## [1] 31.55728 31.31903 31.31903 31.31903 31.31903
## 
## $se
## Time Series:
## Start = 37 
## End = 41 
## Frequency = 1 
## [1] 2.032579 2.252881 2.252881 2.252881 2.252881

2.5 Dáta gnp z knižnice astsa

Popis dát z helpu: Seasonally adjusted quarterly U.S. GNP from 1947(1) to 2002(3).

y <- log(gnp)
plot(y)

Vidíme, že v dátach je trend, teda nie sú stacionárne. Preto dáta diferencujeme.

plot(diff(y))
abline(h=0, col="red")
abline(h=mean(diff(y)), col="blue")

Keďže v pôvodných dátach bol trend, ADF test pre diferencie bude typu “drift” a v modeli pre diferencie bude konštantný člen.

summary(ur.df(diff(y), lags = 5, selectlags = "BIC", type = "drift"))
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.028862 -0.004989 -0.000281  0.005427  0.038233 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.0049476  0.0009199   5.378 1.97e-07 ***
## z.lag.1     -0.5952198  0.0778643  -7.644 7.09e-13 ***
## z.diff.lag  -0.0849017  0.0682341  -1.244    0.215    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.009616 on 213 degrees of freedom
## Multiple R-squared:  0.3302, Adjusted R-squared:  0.324 
## F-statistic: 52.51 on 2 and 213 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -7.6443 29.2178 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.46 -2.88 -2.57
## phi1  6.52  4.63  3.81

Testová štatistika (-7.6443) je menšia ako kritická hodnota (-2.88), preto sa hypotéza o jednotkovom koreni zamieta. Dáta ďalej netreba diferencovať.

acf2(diff(y))

##      [,1] [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8] [,9] [,10] [,11] [,12] [,13]
## ACF  0.35 0.19 -0.01 -0.12 -0.17 -0.11 -0.09 -0.04 0.04  0.05  0.03 -0.12 -0.13
## PACF 0.35 0.08 -0.11 -0.12 -0.09  0.01 -0.03 -0.02 0.05  0.01 -0.03 -0.17 -0.06
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF  -0.10 -0.11  0.05  0.07  0.10  0.06  0.07 -0.09 -0.05 -0.10 -0.05  0.00
## PACF  0.02 -0.06  0.10  0.00  0.02 -0.04  0.01 -0.11  0.03 -0.03  0.00  0.01

Z PACF to vyzerá na AR(1), z ACF na MA(2). V modeli bude zahrnutá konštanta, keďže v pôvodných dátach bol rastúci trend.

ar1 <- sarima(y,1,1,0) # rezidua tesne OK, BIC = -6.400957 
## initial  value -4.589567 
## iter   2 value -4.654150
## iter   3 value -4.654150
## iter   4 value -4.654151
## iter   4 value -4.654151
## iter   4 value -4.654151
## final  value -4.654151 
## converged
## initial  value -4.655919 
## iter   2 value -4.655921
## iter   3 value -4.655921
## iter   4 value -4.655922
## iter   5 value -4.655922
## iter   5 value -4.655922
## iter   5 value -4.655922
## final  value -4.655922 
## converged
## <><><><><><><><><><><><><><>
##  
## Coefficients: 
##          Estimate     SE t.value p.value
## ar1        0.3467 0.0627  5.5255       0
## constant   0.0083 0.0010  8.5398       0
## 
## sigma^2 estimated as 9.029576e-05 on 220 degrees of freedom 
##  
## AIC = -6.446939  AICc = -6.446692  BIC = -6.400957 
## 

ma2 <- sarima(y,0,1,2) # rezidua OK, BIC = -6.388822 
## initial  value -4.591629 
## iter   2 value -4.661095
## iter   3 value -4.662220
## iter   4 value -4.662243
## iter   5 value -4.662243
## iter   6 value -4.662243
## iter   6 value -4.662243
## iter   6 value -4.662243
## final  value -4.662243 
## converged
## initial  value -4.662021 
## iter   2 value -4.662022
## iter   2 value -4.662022
## iter   2 value -4.662022
## final  value -4.662022 
## converged
## <><><><><><><><><><><><><><>
##  
## Coefficients: 
##          Estimate     SE t.value p.value
## ma1        0.3028 0.0654  4.6272  0.0000
## ma2        0.2035 0.0644  3.1593  0.0018
## constant   0.0083 0.0010  8.7177  0.0000
## 
## sigma^2 estimated as 8.919192e-05 on 219 degrees of freedom 
##  
## AIC = -6.450131  AICc = -6.449635  BIC = -6.388822 
## 

AR(1) model má nižšie BIC.

Overíme ešte stacionaritu.

koef <- ar1$fit$coef
koef_ar <- koef[grep("^ar",names(koef))]

Absolútna hodnota koreňa polynómu je

abs(polyroot(c(1,-koef_ar)))
## [1] 2.884724

Proces je stacionárny (aj inverovateľný - lebo je to AR).

Predikcie:

sarima.for(y, 10,1,1,0)

## $pred
##          Qtr1     Qtr2     Qtr3     Qtr4
## 2002                            9.165886
## 2003 9.174510 9.182946 9.191316 9.199664
## 2004 9.208004 9.216342 9.224678 9.233014
## 2005 9.241350                           
## 
## $se
##             Qtr1        Qtr2        Qtr3        Qtr4
## 2002                                     0.009502408
## 2003 0.015938787 0.021173624 0.025569341 0.029380482
## 2004 0.032772142 0.035850982 0.038687708 0.041330887
## 2005 0.043815131

Zachováva sa rastúci trend.