Welcome to Exdir’s documentation!¶
The Experimental Directory Structure (Exdir) is a proposed, open file format specification for experimental pipelines. Exdir uses the same abstractions as HDF5 and is compatible with the HDF5 Abstract Data Model, but stores data and metadata in directories instead of in a single file. Exdir uses file system directories to represent the hierarchy, with metadata stored in human-readable YAML files, datasets stored in binary NumPy files, and raw data stored directly in subdirectories. Furthermore, storing data in multiple files makes it easier to track for version control systems. Exdir is not a file format in itself, but a specification for organizing files in a directory structure. With the publication of Exdir, we invite the scientific community to join the development to create an open specification that will serve as many needs as possible and as a foundation for open access to and exchange of data.
Exdir is described in detail in our reasearch paper:
Experimental Directory Structure (Exdir): An Alternative to HDF5 Without Introducing a New File Format.
exdir is not a file format in itself, but rather a specification for a directory structure with NumPy and YAML files.
example.exdir (File, folder) │ attributes.yaml (-, file) │ exdir.yaml (-, file) │ ├── dataset1 (Dataset, folder) │ ├── data.npy (-, file) │ ├── attributes.yaml (-, file) │ └── exdir.yaml (-, file) │ └── group1 (Group, folder) ├── attributes.yaml (-, file) ├── exdir.yaml (-, file) │ └── dataset2 (Dataset, folder) ├── data.npy (-, file) ├── attributes.yaml (-, file) ├── exdir.yaml (-, file) │ └── raw (Raw, folder) ├── image0001.tif (-, file) ├── image0002.tif (-, file) └── ...
The above structure shows the name of the object, the type of the object in exdir and the type of the object on the file system as follows:
[name] ([exdir type], [file system type])
A dash (-) indicates that the object doesn’t have a separate internal representation in the format, but is used indirectly. It is however explicitly stored in the file system.
conda install -c cinpla exdir
For more, see Installation.
Quick usage example¶
>>> import exdir >>> import numpy as np >>> f = exdir.File("mytestfile.exdir")
The File object points to the root folder in the exdir file structure. You can add groups and datasets to it.
>>> my_group = f.require_group("my_group") >>> a = np.arange(100) >>> dset = f.require_dataset("my_data", data=a)
These can later be accessed with square brackets:
>>> f["my_data"] 10
Groups can hold other groups or datasets:
>>> subgroup = my_group.require_group("subgroup") >>> subdata = subgroup.require_dataset("subdata", data=a)
Datasets support array-style slicing:
>>> dset[0:100:10] memmap([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])
Attributes can be added to files, groups and datasets:
>>> f.attrs["description"] = "My first exdir file" >>> my_group.attrs["meaning_of_life"] = 42 >>> dset.attrs["trial_number"] = 12 >>> f.attrs["description"] 'My first exdir file'
An exdir object contains two types of objects: datasets, which are array-like collections of data, and groups, which are directories containing datasets and other groups.
An exdir directory is created by:
>>> import exdir >>> import numpy as np >>> f = exdir.File("myfile.exdir", "w")
The File object containes many useful methods including
>>> data = np.arange(100) >>> dset = f.require_dataset("mydataset", data=data)
The created object is not an array but an exdir dataset. Like NumPy arrays, datasets have a shape:
>>> dset.shape (100,)
Also array-style slicing is supported:
>>> dset 0 >>> dset 10 >>> dset[0:100:10] memmap([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])
For more, see File Objects and Datasets.
Groups and hierarchical organization¶
Every object in an exdir directory has a name, and they’re arranged in a POSIX-style hierarchy with
>>> dset.name '/mydataset'
The “directory” in this system are called groups.
The File object we created is itself a group, in this case the root group, named
>>> f.name '/'
Creating a subgroup is done by using
>>> grp = f.require_group("subgroup")
exdir.core.Group objects also have the
require_* methods like File:
>>> dset2 = grp.require_dataset("another_dataset", data=data) >>> dset2.name '/subgroup/another_dataset'
You retrieve objects in the file using the item-retrieval syntax:
>>> dataset_three = f['subgroup/another_dataset']
Iterating over a group provides the names of its members:
>>> for name in f: ... print(name) mydataset subgroup
Containership testing also uses names:
>>> "mydataset" in f True >>> "somethingelse" in f False
You can even use full path names:
>>> "subgroup/another_dataset" in f True >>> "subgroup/somethingelse" in f False
There are also the familiar
exdir.core.Group.iter() methods, as well as
For more, see Groups.
With exdir you can store metadata right next to the data it describes.
All groups and datasets can have attributes which are descibed by
Attributes are accessed through the
attrs proxy object, which again
implements the dictionary interface:
>>> dset.attrs['temperature'] = 99.5 >>> dset.attrs['temperature'] 99.5 >>> 'temperature' in dset.attrs True
For more, see Attributes.
The development of Exdir owes a great deal to other standardization efforts in science in general and neuroscience in particular, among them the contributors to HDF5, NumPy, YAML, PyYAML, ruamel-yaml, SciPy, Klusta Kwik, NeuralEnsemble, and Neurodata Without Borders.