r - What makes rollmean faster than rollapply (code-wise)? -


i regularly find rolling things of time series (particularly means), , surprised find rollmean notably faster rollapply, , align = 'right' methods faster rollmeanr wrappers.

how have achieved speed up? , why 1 lose of when using rollmeanr() wrapper?

some background: had been using rollapplyr(x, n, function(x) mean(x)), happened upon few examples using rollmean. documents suggest rollapplyr(x, n, mean) (note without function part of argument) uses rollmean didn't think there difference in performance, rbenchmark revealed notable differences.

require(zoo) require(rbenchmark)  x <- rnorm(1e4) r1 <- function() rollapplyr(x, 3, mean) # uses rollmean r2 <- function() rollapplyr(x, 3, function(x) mean(x)) r3 <- function() rollmean(x, 3, na.pad = true, align = 'right') r4 <- function() rollmeanr(x, 3, align = "right")  bb <- benchmark(r1(), r2(), r3(), r4(),            columns = c('test', 'elapsed', 'relative'),            replications = 100,            order = 'elapsed')  print(bb) 

i surprised find rollmean(x, n, align = 'right') notably faster -- , ~40x faster rollapply(x, n, function(x) mean(x)) approach.

  test elapsed relative 3 r3()    0.74    1.000 4 r4()    0.86    1.162 1 r1()    0.98    1.324 2 r2()   27.53   37.203 

the difference seems larger size of data-set grows. changed size of x (to rnorm(1e5)) in above code , re-ran test , there larger difference between functions.

  test elapsed relative 3 r3()   13.33    1.000 4 r4()   17.43    1.308 1 r1()   19.83    1.488 2 r2()  279.47   20.965  

and x <- rnorm(1e6)

  test elapsed relative 3 r3()   44.23    1.000 4 r4()   54.30    1.228 1 r1()   65.30    1.476 2 r2() 2473.35   55.920 

how have done this? also, optimal solution? sure, fast there even faster way this?

(note: in general time series xts objects -- matter?)

computing rolling mean faster computing general rolling function, because first 1 easier compute. when computing general rolling function have compute function on each window again , again, don't have mean, because of simple identity:

 (a2 + a3 + ... + an)/(n-1) = (a1 + a2 + ... + a(n-1))/(n-1) + (an - a1)/(n-1) 

and can see how that's leveraged looking @ getanywhere(rollmean.zoo).

if want faster rolling mean, use runmean catools, implemented in c making faster (it scales lot better faster size of data increases).

library(microbenchmark) library(catools) library(zoo)  x = rnorm(1e4) microbenchmark(runmean(x, 3, endrule = 'trim', align = 'right'),                rollmean(x, 3, align = 'right')) #unit: microseconds #                                             expr      min        lq     median        uq       max neval # runmean(x, 3, endrule = "trim", align = "right")  631.061  740.0775   847.5915  1020.048  1652.109   100 #                  rollmean(x, 3, align = "right") 7308.947 9155.7155 10627.0210 12760.439 16919.092   100 

Comments

Popular posts from this blog

plot - Remove Objects from Legend When You Have Also Used Fit, Matlab -

java - Why does my date parsing return a weird date? -

Need help in packaging app using TideSDK on Windows -