Začneme s inštaláciou knižníc potrebných na toto cvičenie. Spustením nasledovného príkazu sa nainštalujú tie z knižníc, ktoré nainštalované nie sú, a zároveň sa aj načítajú.
# názvy balíkov, ktoré budú potrebné pre cvičenie 1
packages <- c("astsa", "urca", "datasets", "ggplot2", "forecast")
# inštalácia balíkov, ak nie sú nainštalované
installed_packages <- packages %in% rownames(installed.packages())
if (any(installed_packages == FALSE)) {
install.packages(packages[!installed_packages],dependencies=TRUE)
}
# načítanie balíkov
invisible(lapply(packages, library, character.only = TRUE))
Zadanie a otázky sú vždy rovnaké ako v modelovom príklade.
datasets
Popis dát z helpu: The sales time series BJsales and leading indicator BJsales.lead each contain 150 observations.
y <- window(BJsales, end=130)
plot(y)
Vidíme, že v dátach je trend, takže ich určite treba aspoň jedenkrát diferencovať.
plot(diff(y))
Testujme prítomnosť jednotkového koreňa v diferenciách. Keďže v pôvodných dátach bol trend, v teste jednotkového koreňa pre diferencie volíme type=“drift”. Model pre diferencie teda bude obsahovať konštantu. Konštantný člen v modeli pre diferencie totiž znamená lineárny trend v pôvodných dátach a tento trend sa dostane aj do predikcií.
summary(ur.df(diff(y), lags = 5, selectlags = "BIC", type = "drift"))
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression drift
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.3193 -0.9485 -0.0214 0.7786 3.4804
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.26137 0.13789 1.895 0.0604 .
## z.lag.1 -0.54178 0.10373 -5.223 7.5e-07 ***
## z.diff.lag -0.20066 0.08856 -2.266 0.0253 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.43 on 120 degrees of freedom
## Multiple R-squared: 0.3702, Adjusted R-squared: 0.3597
## F-statistic: 35.27 on 2 and 120 DF, p-value: 8.962e-13
##
##
## Value of test-statistic is: -5.2231 13.6411
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau2 -3.46 -2.88 -2.57
## phi1 6.52 4.63 3.81
Testová štatistika (-5.2231) je menšia ako kritická hodnota (-2.88), preto sa hypotéza o jednotkovom koreni zamieta. Dáta ďalej netreba diferencovať. Diferencie sú stacionárne.
Ideme hľadať ARMA model pre diferencie:
acf2(diff(y))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF 0.32 0.28 0.25 0.27 0.16 0.14 0.06 0.14 -0.03 0.00 0.08 -0.02 -0.05
## PACF 0.32 0.20 0.14 0.14 0.00 0.00 -0.07 0.08 -0.12 -0.02 0.11 -0.07 -0.04
## [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22]
## ACF -0.08 -0.09 0.03 -0.01 -0.06 0.03 0.11 -0.01 0.00
## PACF -0.07 -0.06 0.11 0.05 -0.04 0.06 0.15 -0.09 -0.06
Tipy: Buď AR(2) alebo MA(4).
ar2 <- sarima(y, 2,1,0) # rezidua OK, BIC = 3.661291
## initial value 0.415208
## iter 2 value 0.349355
## iter 3 value 0.341581
## iter 4 value 0.341492
## iter 5 value 0.341491
## iter 6 value 0.341491
## iter 7 value 0.341491
## iter 8 value 0.341491
## iter 8 value 0.341491
## final value 0.341491
## converged
## initial value 0.336421
## iter 2 value 0.336391
## iter 3 value 0.336362
## iter 4 value 0.336361
## iter 5 value 0.336361
## iter 5 value 0.336361
## iter 5 value 0.336361
## final value 0.336361
## converged
## <><><><><><><><><><><><><><>
##
## Coefficients:
## Estimate SE t.value p.value
## ar1 0.2510 0.0858 2.9259 0.0041
## ar2 0.2022 0.0858 2.3572 0.0200
## constant 0.4292 0.2232 1.9229 0.0568
##
## sigma^2 estimated as 1.956716 on 126 degrees of freedom
##
## AIC = 3.572615 AICc = 3.574103 BIC = 3.661291
##
ma4 <- sarima(y, 0,1,4) #rezidua OK, BIC = 3.723177 ,
## initial value 0.409765
## iter 2 value 0.348396
## iter 3 value 0.334337
## iter 4 value 0.330026
## iter 5 value 0.329306
## iter 6 value 0.329274
## iter 7 value 0.329273
## iter 8 value 0.329272
## iter 9 value 0.329272
## iter 10 value 0.329272
## iter 10 value 0.329272
## iter 10 value 0.329272
## final value 0.329272
## converged
## initial value 0.329639
## iter 2 value 0.329633
## iter 3 value 0.329631
## iter 4 value 0.329631
## iter 5 value 0.329631
## iter 6 value 0.329631
## iter 6 value 0.329631
## iter 6 value 0.329631
## final value 0.329631
## converged
## <><><><><><><><><><><><><><>
##
## Coefficients:
## Estimate SE t.value p.value
## ma1 0.2059 0.0897 2.2959 0.0234
## ma2 0.1570 0.0917 1.7126 0.0893
## ma3 0.1584 0.0771 2.0536 0.0421
## ma4 0.1665 0.0792 2.1019 0.0376
## constant 0.4320 0.2049 2.1083 0.0370
##
## sigma^2 estimated as 1.930203 on 124 degrees of freedom
##
## AIC = 3.590162 AICc = 3.593944 BIC = 3.723177
##
AR(2) má menšie BIC. Overíme ešte stacionaritu.
Všeobecný postup:
koef <- ar2$fit$coef
koef_ar <- koef[grep("^ar",names(koef))]
Absolútne hodnoty koreňov polynómu potom sú
abs(polyroot(c(1,-koef_ar)))
## [1] 1.688394 2.929791
Obe sú väčšie ako 1, proces je stacionárny (aj invertovateľný lebo je to AR).
Predikcie:
sarima.for(y, 20,2,1,0)
## $pred
## Time Series:
## Start = 131
## End = 150
## Frequency = 1
## [1] 257.4086 257.6503 257.9677 258.3309 258.7210 259.1270 259.5425 259.9636
## [9] 260.3880 260.8144 261.2419 261.6701 262.0988 262.5277 262.9567 263.3858
## [17] 263.8149 264.2441 264.6733 265.1025
##
## $se
## Time Series:
## Start = 131
## End = 150
## Frequency = 1
## [1] 1.398827 2.240262 3.084861 3.838836 4.527822 5.154490 5.729799
## [8] 6.261398 6.756340 7.220211 7.657601 8.072206 8.467030 8.844530
## [15] 9.206724 9.555292 9.891636 10.216944 10.532227 10.838352
lines(window(BJsales, start=131), col="blue")
datasets
Popis dát z helpu: A time series of the numbers of users connected to the Internet through a server every minute.
y <- window(WWWusage, start = 1, end = 90)
plot(y)
V dátach nie je lineárny trend. Testujme prítomnosť jednotkového koreňa. Stredná hodnota je niekde okolo 140, preto volíme type = “drift”.
summary(ur.df(y, lags = 5, selectlags = "BIC", type = "drift"))
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression drift
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.6658 -2.0290 -0.1395 1.7978 6.7602
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.29086 1.57387 2.726 0.00789 **
## z.lag.1 -0.03143 0.01192 -2.637 0.01006 *
## z.diff.lag1 1.09678 0.10462 10.483 < 2e-16 ***
## z.diff.lag2 -0.57526 0.14746 -3.901 0.00020 ***
## z.diff.lag3 0.35540 0.10656 3.335 0.00130 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.076 on 79 degrees of freedom
## Multiple R-squared: 0.7334, Adjusted R-squared: 0.7199
## F-statistic: 54.33 on 4 and 79 DF, p-value: < 2.2e-16
##
##
## Value of test-statistic is: -2.6373 3.7222
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau2 -3.51 -2.89 -2.58
## phi1 6.70 4.71 3.86
Testová štatistika (-2.6373) je väčšia ako kritická hodnota (-2.89), preto sa hypotéza o jednotkovom koreni nezamieta. Dáta treba diferencovať.
plot(diff(y))
Keďže v pôvodných dátach nebol trend, v teste jednotkového koreňa pre diferencie volíme type=“none”. Model pre diferencie potom nebude obsahovať konštantu. Konštantný člen v modeli pre diferencie by totiž znamenal lineárny trend v pôvodných dátach a tento trend by sa dostal aj do predikcií, čo nechceme.
summary(ur.df(diff(y), lags = 5, selectlags = "BIC", type = "none"))
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression none
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.2924 -1.4973 -0.1364 2.2677 7.3313
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## z.lag.1 -0.15977 0.06698 -2.386 0.01942 *
## z.diff.lag1 0.30561 0.10238 2.985 0.00376 **
## z.diff.lag2 -0.30490 0.10703 -2.849 0.00558 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.17 on 80 degrees of freedom
## Multiple R-squared: 0.2411, Adjusted R-squared: 0.2126
## F-statistic: 8.472 on 3 and 80 DF, p-value: 5.914e-05
##
##
## Value of test-statistic is: -2.3856
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau1 -2.6 -1.95 -1.61
Testová štatistika (-2.3856) je menšia ako kritická hodnota (-1.95), preto sa hypotéza o jednotkovom koreni zamieta. Dáta ďalej netreba diferencovať, sú stacionárne.
Ideme hľadať ARMA model:
acf2(diff(y))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF 0.8 0.55 0.43 0.37 0.29 0.19 0.10 0.02 0.01 0.03 0.04 -0.03 -0.14
## PACF 0.8 -0.24 0.24 -0.03 -0.06 -0.04 -0.08 -0.03 0.08 0.03 0.02 -0.18 -0.15
## [,14] [,15] [,16] [,17] [,18] [,19] [,20]
## ACF -0.22 -0.24 -0.22 -0.24 -0.31 -0.36 -0.37
## PACF -0.08 -0.02 0.07 -0.11 -0.09 -0.10 -0.08
Poďla PACF to vyzerá na AR(3). Vyskúšame aj zmiešaný ARMA model.
ar3 <- sarima(y, 3,1,0, no.constant = TRUE) # rezidua OK, BIC = 5.308413
## initial value 1.761982
## iter 2 value 1.534797
## iter 3 value 1.378519
## iter 4 value 1.251812
## iter 5 value 1.184973
## iter 6 value 1.143419
## iter 7 value 1.129477
## iter 8 value 1.128519
## iter 9 value 1.128517
## iter 10 value 1.128516
## iter 11 value 1.128515
## iter 12 value 1.128515
## iter 13 value 1.128515
## iter 13 value 1.128515
## iter 13 value 1.128515
## final value 1.128515
## converged
## initial value 1.134577
## iter 2 value 1.134475
## iter 3 value 1.134420
## iter 4 value 1.134400
## iter 5 value 1.134400
## iter 5 value 1.134400
## iter 5 value 1.134400
## final value 1.134400
## converged
## <><><><><><><><><><><><><><>
##
## Coefficients:
## Estimate SE t.value p.value
## ar1 1.1193 0.1028 10.8882 0.0000
## ar2 -0.5846 0.1460 -4.0041 0.0001
## ar3 0.2945 0.1019 2.8902 0.0049
##
## sigma^2 estimated as 9.50564 on 86 degrees of freedom
##
## AIC = 5.196565 AICc = 5.199737 BIC = 5.308413
##
arma11 <- sarima(y, 1,1,1, no.constant = TRUE) # rezidua OK, BIC = 5.285761
## initial value 1.750658
## iter 2 value 1.342460
## iter 3 value 1.178228
## iter 4 value 1.166667
## iter 5 value 1.146411
## iter 6 value 1.143394
## iter 7 value 1.142611
## iter 8 value 1.142174
## iter 9 value 1.142169
## iter 10 value 1.142169
## iter 10 value 1.142169
## iter 10 value 1.142169
## final value 1.142169
## converged
## initial value 1.148300
## iter 2 value 1.148294
## iter 3 value 1.148292
## iter 4 value 1.148291
## iter 4 value 1.148291
## iter 4 value 1.148291
## final value 1.148291
## converged
## <><><><><><><><><><><><><><>
##
## Coefficients:
## Estimate SE t.value p.value
## ar1 0.6652 0.0893 7.4535 0
## ma1 0.4870 0.1022 4.7656 0
##
## sigma^2 estimated as 9.783222 on 87 degrees of freedom
##
## AIC = 5.201874 AICc = 5.203442 BIC = 5.285761
##
Menšie BIC má ARMA(1,1) pre prvé diferencie.
Overíme ešte stacionaritu a invertovateľnosť.
Stacionarita:
koef <- arma11$fit$coef
koef_ar <- koef[grep("^ar",names(koef))]
Absolútna hodnota koreňa polynómu potom je
abs(polyroot(c(1,-koef_ar)))
## [1] 1.503241
Teda viac ako 1, proces je stacionárny.
Invertovateľnosť:
koef_ma <- koef[grep("^ma",names(koef))]
Absolútna hodnota koreňa polynómu potom je
abs(polyroot(c(1,koef_ma)))
## [1] 2.053574
Teda viac ako 1, proces je invertovateľný.
Predikcie:
sarima.for(y, 10,1,1,1, no.constant = TRUE)
## $pred
## Time Series:
## Start = 91
## End = 100
## Frequency = 1
## [1] 187.3193 190.8578 193.2117 194.7776 195.8193 196.5123 196.9732 197.2799
## [9] 197.4839 197.6196
##
## $se
## Time Series:
## Start = 91
## End = 100
## Frequency = 1
## [1] 3.127814 7.422812 11.765919 15.919698 19.806972 23.416397 26.763551
## [8] 29.873741 32.774245 35.490944
lines(window(WWWusage, start=91), col="blue")
datasets
Popis dát z helpu: Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA.
y <- ts(faithful[, 1])
y <- window(y, end=250)
plot(y)
V dátach nie je trend. Stredná hodnota niekde okolo 3.5 => type=“drift”.
summary(ur.df(y, lags = 5, selectlags = "BIC", type = "drift"))
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression drift
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.7041 -0.7983 0.1068 0.7382 1.9661
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.55920 0.40299 13.795 <2e-16 ***
## z.lag.1 -1.59387 0.11419 -13.958 <2e-16 ***
## z.diff.lag 0.01476 0.06449 0.229 0.819
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9478 on 241 degrees of freedom
## Multiple R-squared: 0.7851, Adjusted R-squared: 0.7833
## F-statistic: 440.3 on 2 and 241 DF, p-value: < 2.2e-16
##
##
## Value of test-statistic is: -13.958 97.4137
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau2 -3.46 -2.88 -2.57
## phi1 6.52 4.63 3.81
Testová štatistika (-13.958) je menšia ako kritická hodnota (-2.88), preto sa hypotéza o jednotkovom koreni zamieta. Dáta ďalej netreba diferencovať.
Ideme hľadať sarima model:
acf2(y)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## ACF -0.57 0.32 -0.09 -0.02 0.05 -0.06 0.01 0.01 -0.05 -0.01 0.04 -0.06
## PACF -0.57 0.00 0.14 -0.03 -0.02 -0.04 -0.05 0.01 -0.03 -0.08 0.03 0.01
## [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF 0.04 0.01 -0.03 0.02 0.02 -0.06 0.13 -0.12 0.10 -0.06 0.02 -0.04
## PACF -0.02 0.03 -0.01 -0.04 0.04 -0.05 0.08 0.01 0.01 0.00 0.00 -0.07
## [,25] [,26]
## ACF -0.01 0.06
## PACF -0.07 0.08
Buď MA(2) alebo AR(1) (možno AR(3)?).
ma2 <- sarima(y, 0,0,2) # rezidua OK (prve na hrane), BIC = 2.80667
## initial value 0.133829
## iter 2 value -0.058892
## iter 3 value -0.059848
## iter 4 value -0.060042
## iter 5 value -0.060042
## iter 6 value -0.060042
## iter 6 value -0.060042
## iter 6 value -0.060042
## final value -0.060042
## converged
## initial value -0.059769
## iter 2 value -0.059774
## iter 3 value -0.059775
## iter 3 value -0.059775
## iter 3 value -0.059775
## final value -0.059775
## converged
## <><><><><><><><><><><><><><>
##
## Coefficients:
## Estimate SE t.value p.value
## ma1 -0.5257 0.0587 -8.9525 0
## ma2 0.3327 0.0582 5.7190 0
## xmean 3.4764 0.0480 72.4058 0
##
## sigma^2 estimated as 0.8858874 on 247 degrees of freedom
##
## AIC = 2.750327 AICc = 2.750717 BIC = 2.80667
##
ar1 <- sarima(y, 1,0,0) # rezidua OK, BIC = 2.787634
## initial value 0.135811
## iter 2 value -0.057035
## iter 3 value -0.057038
## iter 4 value -0.057038
## iter 4 value -0.057038
## iter 4 value -0.057038
## final value -0.057038
## converged
## initial value -0.058246
## iter 2 value -0.058250
## iter 3 value -0.058250
## iter 4 value -0.058250
## iter 4 value -0.058250
## iter 4 value -0.058250
## final value -0.058250
## converged
## <><><><><><><><><><><><><><>
##
## Coefficients:
## Estimate SE t.value p.value
## ar1 -0.5641 0.0520 -10.8394 0
## xmean 3.4785 0.0382 91.1252 0
##
## sigma^2 estimated as 0.8886673 on 248 degrees of freedom
##
## AIC = 2.745377 AICc = 2.745571 BIC = 2.787634
##
ar3 <- sarima(y, 3,0,0) # rezidua OK, BIC = 2.813068 , nesignifikantny ar2 koeficient
## initial value 0.135472
## iter 2 value 0.044159
## iter 3 value -0.069498
## iter 4 value -0.070881
## iter 5 value -0.071163
## iter 6 value -0.071196
## iter 7 value -0.071196
## iter 8 value -0.071196
## iter 8 value -0.071196
## iter 8 value -0.071196
## final value -0.071196
## converged
## initial value -0.067479
## iter 2 value -0.067612
## iter 3 value -0.067619
## iter 4 value -0.067619
## iter 4 value -0.067619
## iter 4 value -0.067619
## final value -0.067619
## converged
## <><><><><><><><><><><><><><>
##
## Coefficients:
## Estimate SE t.value p.value
## ar1 -0.5641 0.0626 -9.0135 0.0000
## ar2 0.0772 0.0725 1.0648 0.2880
## ar3 0.1370 0.0630 2.1754 0.0306
## xmean 3.4773 0.0438 79.4660 0.0000
##
## sigma^2 estimated as 0.8719722 on 246 degrees of freedom
##
## AIC = 2.742639 AICc = 2.743292 BIC = 2.813068
##
AR(1) má najmenšie BIC. Overíme ešte stacionaritu.
koef <- ar1$fit$coef
koef_ar <- koef[grep("^ar",names(koef))]
Absolútne hodnoty koreňov polynómu potom sú
abs(polyroot(c(1,-koef_ar)))
## [1] 1.772661
Je stacionárny (aj inverovateľný - lebo je to AR).
Predikcie:
sarima.for(y, 22, 1,0,0)
## $pred
## Time Series:
## Start = 251
## End = 272
## Frequency = 1
## [1] 2.986922 3.755866 3.322087 3.566792 3.428748 3.506622 3.462691 3.487474
## [9] 3.473493 3.481380 3.476931 3.479441 3.478025 3.478824 3.478373 3.478627
## [17] 3.478484 3.478565 3.478519 3.478545 3.478530 3.478538
##
## $se
## Time Series:
## Start = 251
## End = 272
## Frequency = 1
## [1] 0.9426915 1.0823459 1.1231524 1.1358310 1.1398362 1.1411079 1.1415123
## [8] 1.1416409 1.1416819 1.1416949 1.1416990 1.1417004 1.1417008 1.1417009
## [15] 1.1417010 1.1417010 1.1417010 1.1417010 1.1417010 1.1417010 1.1417010
## [22] 1.1417010
lines(window(ts(faithful[, 1]), start=251), col="blue")
astsa
Popis dát z helpu: Multiple time series of measurements made for 91 days on the three variables, log(white blood count) [WBC], log(platelet) [PLT] and hematocrit [HCT]. Missing data code is NA.
y <- ts(blood[1:36, 3])
plot(y)
V dátach nie je trend. Stredná hodnota niekde okolo 30 => type=“drift”
summary(ur.df(y, lags = 5, selectlags = "BIC", type = "drift"))
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression drift
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.7026 -0.5769 0.3090 1.0144 4.5225
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19.1631 5.8934 3.252 0.00307 **
## z.lag.1 -0.6117 0.1876 -3.260 0.00301 **
## z.diff.lag 0.2079 0.1842 1.129 0.26889
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.096 on 27 degrees of freedom
## Multiple R-squared: 0.2877, Adjusted R-squared: 0.235
## F-statistic: 5.454 on 2 and 27 DF, p-value: 0.01025
##
##
## Value of test-statistic is: -3.2597 5.3131
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau2 -3.58 -2.93 -2.60
## phi1 7.06 4.86 3.94
Testová štatistika (-3.2597) je menšia ako kritická hodnota (-2.93), preto sa hypotéza o jednotkovom koreni zamieta. Dáta ďalej netreba diferencovať.
acf2(y)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## ACF 0.45 0.04 -0.05 -0.03 -0.14 -0.11 -0.24 -0.28 -0.19
## PACF 0.45 -0.21 0.03 -0.02 -0.16 0.04 -0.29 -0.09 -0.05
AR(1) alebo MA(1).
ar1 <- sarima(y,1,0,0) # BIC = 4.589554
## initial value 0.849044
## iter 2 value 0.732591
## iter 3 value 0.732571
## iter 4 value 0.732571
## iter 4 value 0.732571
## iter 4 value 0.732571
## final value 0.732571
## converged
## initial value 0.726939
## iter 2 value 0.726534
## iter 3 value 0.726525
## iter 4 value 0.726525
## iter 4 value 0.726525
## iter 4 value 0.726525
## final value 0.726525
## converged
## <><><><><><><><><><><><><><>
##
## Coefficients:
## Estimate SE t.value p.value
## ar1 0.4457 0.1462 3.0484 0.0044
## xmean 31.3191 0.6065 51.6374 0.0000
##
## sigma^2 estimated as 4.249914 on 34 degrees of freedom
##
## AIC = 4.457594 AICc = 4.467695 BIC = 4.589554
##
ma1 <- sarima(y,0,0,1) # BIC = 4.562321
## initial value 0.839586
## iter 2 value 0.712250
## iter 3 value 0.712236
## iter 4 value 0.711713
## iter 5 value 0.711710
## iter 6 value 0.711709
## iter 6 value 0.711709
## final value 0.711709
## converged
## initial value 0.712931
## iter 2 value 0.712922
## iter 3 value 0.712909
## iter 3 value 0.712909
## iter 3 value 0.712909
## final value 0.712909
## converged
## <><><><><><><><><><><><><><>
##
## Coefficients:
## Estimate SE t.value p.value
## ma1 0.478 0.1240 3.8551 5e-04
## xmean 31.319 0.4963 63.0993 0e+00
##
## sigma^2 estimated as 4.131375 on 34 degrees of freedom
##
## AIC = 4.430361 AICc = 4.440462 BIC = 4.562321
##
MA(1) má menšie BIC.
Overíme ešte invertovateľnosť.
koef <- ma1$fit$coef
koef_ma <- koef[grep("^ma",names(koef))]
Absolútne hodnoty koreňom polynómu potom sú
abs(polyroot(c(1,koef_ma)))
## [1] 2.09189
Je inverovateľný (aj stacionárny - lebo je to MA).
Predikcie:
sarima.for(y, 5, 0,0,1)
## $pred
## Time Series:
## Start = 37
## End = 41
## Frequency = 1
## [1] 31.55728 31.31903 31.31903 31.31903 31.31903
##
## $se
## Time Series:
## Start = 37
## End = 41
## Frequency = 1
## [1] 2.032579 2.252881 2.252881 2.252881 2.252881
astsa
Popis dát z helpu: Seasonally adjusted quarterly U.S. GNP from 1947(1) to 2002(3).
y <- log(gnp)
plot(y)
Vidíme, že v dátach je trend, teda nie sú stacionárne. Preto dáta diferencujeme.
plot(diff(y))
abline(h=0, col="red")
abline(h=mean(diff(y)), col="blue")
Keďže v pôvodných dátach bol trend, ADF test pre diferencie bude typu “drift” a v modeli pre diferencie bude konštantný člen.
summary(ur.df(diff(y), lags = 5, selectlags = "BIC", type = "drift"))
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression drift
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.028862 -0.004989 -0.000281 0.005427 0.038233
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0049476 0.0009199 5.378 1.97e-07 ***
## z.lag.1 -0.5952198 0.0778643 -7.644 7.09e-13 ***
## z.diff.lag -0.0849017 0.0682341 -1.244 0.215
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.009616 on 213 degrees of freedom
## Multiple R-squared: 0.3302, Adjusted R-squared: 0.324
## F-statistic: 52.51 on 2 and 213 DF, p-value: < 2.2e-16
##
##
## Value of test-statistic is: -7.6443 29.2178
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau2 -3.46 -2.88 -2.57
## phi1 6.52 4.63 3.81
Testová štatistika (-7.6443) je menšia ako kritická hodnota (-2.88), preto sa hypotéza o jednotkovom koreni zamieta. Dáta ďalej netreba diferencovať.
acf2(diff(y))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF 0.35 0.19 -0.01 -0.12 -0.17 -0.11 -0.09 -0.04 0.04 0.05 0.03 -0.12 -0.13
## PACF 0.35 0.08 -0.11 -0.12 -0.09 0.01 -0.03 -0.02 0.05 0.01 -0.03 -0.17 -0.06
## [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF -0.10 -0.11 0.05 0.07 0.10 0.06 0.07 -0.09 -0.05 -0.10 -0.05 0.00
## PACF 0.02 -0.06 0.10 0.00 0.02 -0.04 0.01 -0.11 0.03 -0.03 0.00 0.01
Z PACF to vyzerá na AR(1), z ACF na MA(2). V modeli bude zahrnutá konštanta, keďže v pôvodných dátach bol rastúci trend.
ar1 <- sarima(y,1,1,0) # rezidua tesne OK, BIC = -6.400957
## initial value -4.589567
## iter 2 value -4.654150
## iter 3 value -4.654150
## iter 4 value -4.654151
## iter 4 value -4.654151
## iter 4 value -4.654151
## final value -4.654151
## converged
## initial value -4.655919
## iter 2 value -4.655921
## iter 3 value -4.655921
## iter 4 value -4.655922
## iter 5 value -4.655922
## iter 5 value -4.655922
## iter 5 value -4.655922
## final value -4.655922
## converged
## <><><><><><><><><><><><><><>
##
## Coefficients:
## Estimate SE t.value p.value
## ar1 0.3467 0.0627 5.5255 0
## constant 0.0083 0.0010 8.5398 0
##
## sigma^2 estimated as 9.029576e-05 on 220 degrees of freedom
##
## AIC = -6.446939 AICc = -6.446692 BIC = -6.400957
##
ma2 <- sarima(y,0,1,2) # rezidua OK, BIC = -6.388822
## initial value -4.591629
## iter 2 value -4.661095
## iter 3 value -4.662220
## iter 4 value -4.662243
## iter 5 value -4.662243
## iter 6 value -4.662243
## iter 6 value -4.662243
## iter 6 value -4.662243
## final value -4.662243
## converged
## initial value -4.662021
## iter 2 value -4.662022
## iter 2 value -4.662022
## iter 2 value -4.662022
## final value -4.662022
## converged
## <><><><><><><><><><><><><><>
##
## Coefficients:
## Estimate SE t.value p.value
## ma1 0.3028 0.0654 4.6272 0.0000
## ma2 0.2035 0.0644 3.1593 0.0018
## constant 0.0083 0.0010 8.7177 0.0000
##
## sigma^2 estimated as 8.919192e-05 on 219 degrees of freedom
##
## AIC = -6.450131 AICc = -6.449635 BIC = -6.388822
##
AR(1) model má nižšie BIC.
Overíme ešte stacionaritu.
koef <- ar1$fit$coef
koef_ar <- koef[grep("^ar",names(koef))]
Absolútna hodnota koreňa polynómu je
abs(polyroot(c(1,-koef_ar)))
## [1] 2.884724
Proces je stacionárny (aj inverovateľný - lebo je to AR).
Predikcie:
sarima.for(y, 10,1,1,0)
## $pred
## Qtr1 Qtr2 Qtr3 Qtr4
## 2002 9.165886
## 2003 9.174510 9.182946 9.191316 9.199664
## 2004 9.208004 9.216342 9.224678 9.233014
## 2005 9.241350
##
## $se
## Qtr1 Qtr2 Qtr3 Qtr4
## 2002 0.009502408
## 2003 0.015938787 0.021173624 0.025569341 0.029380482
## 2004 0.032772142 0.035850982 0.038687708 0.041330887
## 2005 0.043815131
Zachováva sa rastúci trend.