HighFive 3.0.0
HighFive - Header-only C++ HDF5 interface
|
A collection of tips for migrating away from deprecated features.
FixedLenStringArray
.The issue with FixedLenStringArray
is that it is unable to avoid copies. Essentially, this class acts as a means to create a copy of the data in a format suitable for writing fixed-length strings. Additionally, the class acts as a tag for HighFive to overload on. The support of std::string
in HighFive has improved considerable. Since 2.8.0 we can write/read std::string
to fixed or variable length HDF5 strings.
Therefore, this class serves no purpose anymore. Any occurrence of it can be replaced with an std::vector<std::string>
(for example).
If desired one can silence warnings by replacing FixedLenStringArray
with deprecated::FixedLenStringArray
.
read(T*, ...)
.A "raw read" is when the user allocates sufficient bytes and provides HighFive with the pointer to the first byte. "Regular reads" take a detour via the inspector and might resize the container, etc.
The issue is that HighFive v2
had the following two overloads:
and the analogous for Attribute
.
The issue is that the second overload will also match things like T**
and T[][]
. For example the following code used the removed overload:
which is fine because is a contiguous sequence of doubles. It's equivalent to following v3
code:
We consider the example above to be accidentally using a raw read, when it could be performing a regular read. We suggest to not change the above, i.e.
continues to be correct in v3
and can check that the dimensions match. The inspector recognizes double[n][m]
as a contiguous array of doubles. Therefore, it'll use the shallow-copy buffer and avoid the any additional allocations or copies.
When genuinely performing a "raw read", one must replace read
with read_raw
. For example:
is correct in v3
.
T**
, T***
, etc.The immediately preceding section is likely relevant.
In v2
raw pointers could be used to indicate dimensionality. For example:
was valid and would write the flat array x
into the two-dimensional dataset "foo"
. This must be modernized as follows:
In v3
the type T**
will refer a pointer to a pointer (as usual). The following:
is correct in v3
but would probably segfault in v2
.
In v3
we completely rewrote the CMake code of HighFive. Since HighFive is a header only library, it needs to perform two tasks:
-I ${HIGHFIVE_DIR}
and links with HDF5.We've removed all flags for optional dependencies, such as -DHIGHFIVE_USE_BOOST
. Instead user that want to read/write into/from optionally supported containers, include a header with the corresponding name and make sure to adjust their CMake code to link with the dependency.
The C++ code should have:
and the CMake code would have
There are extensive examples of project integration in tests/cmake_integration
, including how those projects in turn can be included in other projects. If these examples don't help, please feel free to open an Issue.
DataSpace::DataSpaceType
.We've converted the enum
DataSpace::DataSpaceType
to an enum class
. We've added static constexpr
members dataspace_null
and dataspace_scalar
to DataSpace
. This minimizes the risk of breaking user code.
Note that objects of type DataSpace::DataSpaceType
will no longer silently convert to an integer. Including the two constants DataSpace::dataspace_{scalar,null}
.
FileDriver
and MPIOFileDriver
.These have been deprecated to stick more closely with familiar HDF5 concepts. The FileDriver
is synonymous to FileAccessProps
; and MPIOFileDriver
is the same as:
We felt that the savings in typing effort weren't worth introducing the concept of a "file driver". Removing the concept hopefully makes it easier to add a better abstraction for the handling of the property lists, when we discover such an abstraction.
HighFive v2 had a feature that a dataset (or attribute) of shape [n, 1]
could be read into a one-dimensional array automatically.
The feature is prone to accidentally not failing. Consider an array that shape [n, m]
and in general both n, m > 0
. Hence, one should always be reading into a two-dimensional array, even if n == 1
or m == 1
. However, due to broadcasting, if one of the dimensions (accidentally) happens to be one, then the checks wont fails. This isn't a bug, however, it can hide a bug. For example if the test happen to use [n, 1]
datasets and a one-dimensional array.
Broadcasting in HighFive was different from broadcasting in NumPy. For reading into one-dimensional data HighFive supports stripping all dimensions that are not 1
. When extending the feature to multi-dimensional arrays it gets tricky. We can't strip from both the front and back. If we allow stripping from both ends, arrays such as [1, n, m]
read into [n, m]
if m > 1
but into [1, n]
(instead of [n, 1]
) if (coincidentally) m == 1
. For HighFive because avoiding being forced to read [n, 1]
into std::vector<std::vector<T>>
is more important than [1, n]
. Flattening the former requires copying everything while the latter can be made flat by just accessing the first value. Therefore, HighFive had a preference to strip from the right, while NumPy adds 1
s to the front/left of the shape.
In v3
we've removed broadcasting. Instead users must use one of the two alternatives: squeezing and reshaping. The examples show will use datasets and reading, but it works the same for attributes and writing.
Often we know that the k
th dimension is 1
, e.g. a column is [n, 1]
and a row is [1, m]
. In this case it's convenient to state, remove dimension k
. The syntax to simultaneously remove the dimensions {0, 2}
is:
Which will read a dataset with dimensions [1, n, 1]
into an array of shape [n]
.
Sometimes it's easier to state what the new shape must be. For this we have the syntax:
To declare that array
should have dimensions dims
even if dset.getDimensions()
is something different.
Example:
to read into a one-dimensional array.
There's a safe case that seems needlessly strict to enforce: if the dataset is a multi-dimensional array with one element one should be able to read into (write from) a scalar.
The reverse, i.e. reading a scalar value in the HDF5 file into a multi-dimensional array isn't supported, because if we want to support array with runtime-defined rank, we can't deduce the correct shape, e.g. [1]
vs. [1, 1, 1]
, when read into an array.
File::Truncate
and friends.In v2
, File::{ReadOnly,Truncate,...}
was an anonymous member enum of File
. Effectively it's type was the same as an int
.
To improve type-safety, we converted it into an enum class
called File::AccessMode
. In order to reduce the migration effort, we retained the ability to write: File::ReadOnly
.
Functions that accept a file access mode should be modernized as follows:
Note: There's a caveat, the short-hand notation File::ReadOnly
doesn't have an address. Meaning one can't take it's address or const-references of it (results in a linker error about missing symbol File::ReadOnly
). Use File::AccessMode::ReadOnly
instead.
Object*Props
.To our knowledge these could not be used meaningfully. Please create an issue if you relied on these.