saveHDF5SummarizedExperiment {SummarizedExperiment} | R Documentation |
saveHDF5SummarizedExperiment
and loadHDF5SummarizedExperiment
can be used to save/load a HDF5-based SummarizedExperiment object
to/from disk.
saveHDF5SummarizedExperiment(x, dir="my_h5_se", replace=FALSE, chunk_dim=NULL, level=NULL, verbose=FALSE) loadHDF5SummarizedExperiment(dir="my_h5_se")
x |
A SummarizedExperiment object. |
dir |
The path (as a single string) to the directory where to save the
HDF5-based SummarizedExperiment object or to load it from.
When saving, the directory will be created so should not already exist,
unless |
replace |
If directory |
chunk_dim, level |
The dimensions of the chunks and the compression level to use for
writting the assay data to disk.
Passed to the internal calls to |
verbose |
Set to |
These functions use functionalities from the rhdf5 and HDF5Array packages internally and so require these packages to be installed.
saveHDF5SummarizedExperiment
creates the directory specified
thru the dir
argument and then populates it with the HDF5 datasets
(one per assay in x
) plus a serialized version of x
that
contains pointers to these datasets. This directory provides a
self-contained HDF5-based representation of x
that can then
be loaded back in R with loadHDF5SummarizedExperiment
.
Note that this directory is relocatable i.e. it can be moved
(or copied) to a different place, on the same or a different computer,
before calling loadHDF5SummarizedExperiment
on it. For convenient
sharing with collaborators, it is suggested to turn it into a tarball
(with Unix command tar
), or zip file, before the transfer.
Please keep in mind that saveHDF5SummarizedExperiment
and
loadHDF5SummarizedExperiment
don't know how to produce/read
tarballs or zip files at the moment, so the process of packaging/extracting
the tarball or zip file is entirely the user responsibility. It is
typically done from outside R.
Finally please note that, depending on the size of the data to write to
disk and the performance of the disk, saveHDF5SummarizedExperiment
can take a long time to complete. Use verbose=TRUE
to see its
progress.
loadHDF5SummarizedExperiment
is generally very fast, even if
the assay data is big, because all the assays in the returned object
are HDF5Array objects pointing to the on-disk HDF5
datasets located in dir
. HDF5Array objects are
typically light-weight in memory.
saveHDF5SummarizedExperiment
returns an invisible
SummarizedExperiment object where all the assays are
HDF5Array objects pointing to the HDF5 datasets
saved in dir
. It's in fact the same obect as the object that
would be returned by calling loadHDF5SummarizedExperiment
on
dir
.
Hervé Pagès
SummarizedExperiment and RangedSummarizedExperiment objects.
HDF5Array objects in the HDF5Array package.
The writeHDF5Array
function in the
HDF5Array package, which saveHDF5SummarizedExperiment
uses internally to write the assay data to disk.
nrows <- 200; ncols <- 6 counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3), row.names=LETTERS[1:6]) se0 <- SummarizedExperiment(assays=SimpleList(counts=counts), colData=colData) se0 ## Save 'se0' as an HDF5-based SummarizedExperiment object: dir <- sub("file", "h5_se0_", tempfile()) h5_se0 <- saveHDF5SummarizedExperiment(se0, dir) h5_se0 assay(h5_se0, withDimnames=FALSE) # HDF5Matrix object h5_se0b <- loadHDF5SummarizedExperiment(dir) h5_se0b assay(h5_se0b, withDimnames=FALSE) # HDF5Matrix object ## Sanity checks: stopifnot(is(assay(h5_se0, withDimnames=FALSE), "HDF5Matrix")) stopifnot(all(DelayedArray(assay(se0)) == assay(h5_se0))) stopifnot(is(assay(h5_se0b, withDimnames=FALSE), "HDF5Matrix")) stopifnot(all(DelayedArray(assay(se0)) == assay(h5_se0b))) ## --------------------------------------------------------------------- ## More sanity checks ## --------------------------------------------------------------------- ## Make a copy of directory 'dir': somedir <- sub("file", "somedir", tempfile()) dir.create(somedir) file.copy(dir, somedir, recursive=TRUE) dir2 <- list.files(somedir, full.names=TRUE) ## 'dir2' contains a copy of 'dir'. Call loadHDF5SummarizedExperiment() ## on it. h5_se0c <- loadHDF5SummarizedExperiment(dir2) stopifnot(is(assay(h5_se0c, withDimnames=FALSE), "HDF5Matrix")) stopifnot(all(DelayedArray(assay(se0)) == assay(h5_se0c)))