r - What makes rollmean faster than rollapply (code-wise)? -
i regularly find rolling things of time series (particularly means), , surprised find rollmean notably faster rollapply, , align = 'right' methods faster rollmeanr wrappers.
how have achieved speed up? , why 1 lose of when using rollmeanr() wrapper?
some background: had been using rollapplyr(x, n, function(x) mean(x)), happened upon few examples using rollmean. documents suggest rollapplyr(x, n, mean) (note without function part of argument) uses rollmean didn't think there difference in performance, rbenchmark revealed notable differences.
require(zoo) require(rbenchmark) x <- rnorm(1e4) r1 <- function() rollapplyr(x, 3, mean) # uses rollmean r2 <- function() rollapplyr(x, 3, function(x) mean(x)) r3 <- function() rollmean(x, 3, na.pad = true, align = 'right') r4 <- function() rollmeanr(x, 3, align = "right") bb <- benchmark(r1(), r2(), r3(), r4(), columns = c('test', 'elapsed', 'relative'), replications = 100, order = 'elapsed') print(bb) i surprised find rollmean(x, n, align = 'right') notably faster -- , ~40x faster rollapply(x, n, function(x) mean(x)) approach.
test elapsed relative 3 r3() 0.74 1.000 4 r4() 0.86 1.162 1 r1() 0.98 1.324 2 r2() 27.53 37.203 the difference seems larger size of data-set grows. changed size of x (to rnorm(1e5)) in above code , re-ran test , there larger difference between functions.
test elapsed relative 3 r3() 13.33 1.000 4 r4() 17.43 1.308 1 r1() 19.83 1.488 2 r2() 279.47 20.965 and x <- rnorm(1e6)
test elapsed relative 3 r3() 44.23 1.000 4 r4() 54.30 1.228 1 r1() 65.30 1.476 2 r2() 2473.35 55.920 how have done this? also, optimal solution? sure, fast there even faster way this?
(note: in general time series xts objects -- matter?)
computing rolling mean faster computing general rolling function, because first 1 easier compute. when computing general rolling function have compute function on each window again , again, don't have mean, because of simple identity:
(a2 + a3 + ... + an)/(n-1) = (a1 + a2 + ... + a(n-1))/(n-1) + (an - a1)/(n-1) and can see how that's leveraged looking @ getanywhere(rollmean.zoo).
if want faster rolling mean, use runmean catools, implemented in c making faster (it scales lot better faster size of data increases).
library(microbenchmark) library(catools) library(zoo) x = rnorm(1e4) microbenchmark(runmean(x, 3, endrule = 'trim', align = 'right'), rollmean(x, 3, align = 'right')) #unit: microseconds # expr min lq median uq max neval # runmean(x, 3, endrule = "trim", align = "right") 631.061 740.0775 847.5915 1020.048 1652.109 100 # rollmean(x, 3, align = "right") 7308.947 9155.7155 10627.0210 12760.439 16919.092 100
Comments
Post a Comment