Importing a data frame vs creating one in R -
i trying create specialized summary 'matrix' supervisor, , r export in clean, readable form. such, creating scratch basically, tailor our project. problem can't figure out how created data frame behave imported one, headers.
i comfortable dealing imported data frames headers, , calling specific rows name instead of column number:
iris$sepal.length with(iris,sepal.length) iris['sepal.length']
now, if want create data frame (or matrix, i'm not entirely sure difference is), have tried following:
groups<-c("group 1", "group 2") factors<-c("fac 1", "fac 2", "fac 3","fac 4", "fac 5") x<-1:10 y<-11:20 z<-21-30 data<-cbind(groups, factors, x, y, z) names(data) #returns null data$x #clearly doesn't return column 'x' since matrix 'data' has no names data<-data.frame(cbind(groups, factors, x, y, z)) names(data) #confirms there header names
so, have created data frame has columns x, y , z, in reality don't have premade column start off with. if knew how many rows of data there do:
data<-data.frame(1:10) data$x<-x data$y<-y data$z<-z
i tried creating empty data frame, 1 element big, , if try append vector (of length greater 1), error:
data<-data.frame(0) data$x<-x #returns error
my best guess @ pass through data once find out how long many rows of data have (there several factor levels, , summary matrix have row each possible combination of factors). can data frame started simple:
data<-data.frame(length(n)) #where n how many rows of data have
and follow through creating individual vectors each summary statistic want , appending data frame ~$~.
another solution tried play creating matrix , filling in each element calculate within loop. know apply family better loop, make summary table tailored needs need run apply function try pull individual data:
means<-with(iris,tapply(iris[,4],species,mean)) means[1] #this returns species , mean petal width. need numeric part of this, have own headers, or possibly separate summary table each species.
i'm not sure if extracting numerical information apply output better / easier constructing own loop calculate required statistics. nested loop first sort group (2 runs), internal loop run factors (5 runs) total of 10 runs through data. thinking of creating empty martix, , saving data in appropriate cell when calculated. problem, again, calling specific row in matrix. have tried:
m<-matrix(0,ncol=5) m[1,1]<-'groups' m[1,2]<-'factors' m[1,3]<-'mean.x' m[1,4]<-'mean.y' m[1,5]<-'mean.z' names(m) #returns null
my desired output like:
groups factors mean.x mean.y mean.z group 1 fac 1 group 1 fac 2 group 1 fac 3
etc, combinations of groups , factors.
you can use ddply
plyr package that: assume original data frame mydata , new data frame store result newdata:
library(plyr) newdata<-ddply(mydata,.(groups,factors),summarize,mean.x=mean(x),mean.y=mean(y),mean.z=mean(z))
example: mydata<-iris
> newdata<-ddply(mydata,.(species),colwise(mean)) > newdata species sepal.length sepal.width petal.length petal.width 1 setosa 5.006 3.428 1.462 0.246 2 versicolor 5.936 2.770 4.260 1.326 3 virginica 6.588 2.974 5.552 2.026
Comments
Post a Comment