Storing large structured binary data with Haskell -


i'm writing application interacts large (10-1000 gb) memory-mapped binary file, holding bunch of objects refer each other. i've come mechanism read/write data effective, ugly , verbose (imo).

q: there more elegant way achieve i've done?

i have typeclass structured data, 1 method reads structure haskell datatype (dataop readert around io).

class dbstruct     structread :: addr -> dataop 

to make more readable, have typeclass defines structure members go where:

class dbstruct st => structmem structty valty name | structty name -> valty     offset :: structty -> valty -> name -> int64 

i have few helper functions use offset method reading/writing structure elements, reading structures stored references, , lazily deferring structure reads (to allow lazy reading of entire file).

the problem involves lot of repetition use. 1 structure, first have define haskell type:

data rowblock = rowblock {rbnext :: maybe rowblock                          ,rbprev :: maybe rowblock                          ,rbrows :: [rowty]                          } 

then name types:

data next = next data prev = prev data count = count newtype row = row int64 

then instances each structure member:

instance structmem rowblock (maybe (addr rowblock)) next offset _ _ _ = 0 instance structmem rowblock (maybe (addr rowblock)) prev offset _ _ _ = 8 instance structmem rowblock int64 count offset _ _ _ = 16 instance structmem rowblock rowty row offset _ _ (row n) = 24 + n * 8 

then structure read method:

instance dbstruct rowblock     structread =         n <- elemmaybeptr next         p <- elemmaybeptr prev         c <- elemread count         rs <- mapm (elemread . row) [0 .. c-1]         return $ rowblock n p rs 

so i've accomplished re-implement c structs in more verbose (and slow) way. happier if more concise while preserving type safety. surely commonly encountered problem.

a few possible alternatives can think of are:

  • ditch memory-mapped files , use data.binary, writing bytestrings disk normal way.
  • use deriving generic create generic read , write functions
  • overload functional references
  • do magical monadic lenses.

edit: sscce requested

you might try using data.binary ptrs.

for writing:

use data.binary build bytestring. bytestring tuple (foreignptr word8, int, int) holds address, offset, , length data stored. can use data.bytestring.internal package toforeignptr, unpack tuple you. foreign.foreignptr provides withforeignptr, takes function performs io action via pointer. in there can memcpy (a binding provided in data.bytestring.internal) bytestring storage mmapped ptr got mmap.

for reading:

you can use data.bytestring.internal's fromforiegnptr turn ptr bytestring. mmap libraries do, record @ time instead of entire region. once have bytestring view on memory, can unpack data.binary.

another option take advantage of fact bytestring has alternative implementation in data.vector.storable.bytestring, let use storable interface you're using read/write them mmaped ptrs. interface , basic type isomorphic data.bytestring one, it's got storable instances.


Comments

Popular posts from this blog

plot - Remove Objects from Legend When You Have Also Used Fit, Matlab -

java - Why does my date parsing return a weird date? -

Need help in packaging app using TideSDK on Windows -