r - What makes rollmean faster than rollapply (code-wise)? -
i regularly find rolling things of time series (particularly means), , surprised find rollmean
notably faster rollapply
, , align = 'right'
methods faster rollmeanr
wrappers.
how have achieved speed up? , why 1 lose of when using rollmeanr()
wrapper?
some background: had been using rollapplyr(x, n, function(x) mean(x))
, happened upon few examples using rollmean
. documents suggest rollapplyr(x, n, mean)
(note without function
part of argument) uses rollmean
didn't think there difference in performance, rbenchmark
revealed notable differences.
require(zoo) require(rbenchmark) x <- rnorm(1e4) r1 <- function() rollapplyr(x, 3, mean) # uses rollmean r2 <- function() rollapplyr(x, 3, function(x) mean(x)) r3 <- function() rollmean(x, 3, na.pad = true, align = 'right') r4 <- function() rollmeanr(x, 3, align = "right") bb <- benchmark(r1(), r2(), r3(), r4(), columns = c('test', 'elapsed', 'relative'), replications = 100, order = 'elapsed') print(bb)
i surprised find rollmean(x, n, align = 'right')
notably faster -- , ~40x faster rollapply(x, n, function(x) mean(x))
approach.
test elapsed relative 3 r3() 0.74 1.000 4 r4() 0.86 1.162 1 r1() 0.98 1.324 2 r2() 27.53 37.203
the difference seems larger size of data-set grows. changed size of x
(to rnorm(1e5)
) in above code , re-ran test , there larger difference between functions.
test elapsed relative 3 r3() 13.33 1.000 4 r4() 17.43 1.308 1 r1() 19.83 1.488 2 r2() 279.47 20.965
and x <- rnorm(1e6)
test elapsed relative 3 r3() 44.23 1.000 4 r4() 54.30 1.228 1 r1() 65.30 1.476 2 r2() 2473.35 55.920
how have done this? also, optimal solution? sure, fast there even faster way this?
(note: in general time series xts
objects -- matter?)
computing rolling mean faster computing general rolling function, because first 1 easier compute. when computing general rolling function have compute function on each window again , again, don't have mean
, because of simple identity:
(a2 + a3 + ... + an)/(n-1) = (a1 + a2 + ... + a(n-1))/(n-1) + (an - a1)/(n-1)
and can see how that's leveraged looking @ getanywhere(rollmean.zoo)
.
if want faster rolling mean, use runmean
catools
, implemented in c making faster (it scales lot better faster size of data increases).
library(microbenchmark) library(catools) library(zoo) x = rnorm(1e4) microbenchmark(runmean(x, 3, endrule = 'trim', align = 'right'), rollmean(x, 3, align = 'right')) #unit: microseconds # expr min lq median uq max neval # runmean(x, 3, endrule = "trim", align = "right") 631.061 740.0775 847.5915 1020.048 1652.109 100 # rollmean(x, 3, align = "right") 7308.947 9155.7155 10627.0210 12760.439 16919.092 100
Comments
Post a Comment