Table of Contents
In recent years, a community of scientists and computer programmers working in neutron and synchrotron facilities around the world came to the conclusion that a common data format would fulfill a valuable function in the scattering community. As instrumentation becomes more complex and data visualization become more challenging, individual scientists, or even institutions, have found it difficult to keep up with new developments. A common data format makes it easier, both to exchange experimental results and to exchange ideas about how to analyze them. It promotes greater cooperation in software development and stimulates the design of more sophisticated visualization tools. Additional background information is given in A Brief History of the NeXus Format.
This section is designed to give a brief introduction to NeXus, the data format and tools that have been developed in response to these needs. It explains what a modern data format such as NeXus is and how to write simple programs to read and write NeXus files.
The programmers who produce intermediate files for storing analyzed data should agree on simple interchange rules.
The NeXus data format has four components:
to help people understand what is in the data files.
(Base Classes and Application Definitions) to allow the development of portable analysis software.
(NeXus Utilities) to make it easy to read and write NeXus data files.
to provide the scientific data, advice, and continued involvement with the NeXus standard. NeXus provides a forum for the scientific community to exchange ideas in data storage.
In addition, NeXus relies on a set of low-level file formats to actually store NeXus files on physical media. Each of these components are described in more detail in the Physical File Format section.
The NeXus Application-Programmer Interface (NAPI), which provides the set of subroutines for reading and writing NeXus data files, is described briefly in the section called “NAPI: The NeXus Application Programming Interface”. (Further details are provided in the NAPI chapter of Volume II of this documentation.) The principles guiding the design and implementation of the NeXus standard are described in the NeXus Design chapter. Base classes, which comprise the data storage objects used in NeXus data files, are detailed in the Base Classes chapter of Volume II of this documentation. Additionally, a brief list describing the set of NeXus Utilities available to browse, validate, translate, and visualise NeXus data files is provided in the NeXus Utilities chapter.
NeXus data files contain four types of entity: data groups, data fields, attributes, and links.
Data groups are like folders that can contain a number of fields and/or other groups.
Data fields can be scalar values or multidimensional arrays of a variety of sizes (1-byte, 2-byte, 4-byte, 8-byte) and types (characters, integers, floats). In HDF, fields are represented as HDF Scientific Data Sets (also known as SDS).
Extra information required to describe a particular group or field, such as the data units, can be stored as a data attribute.
Links are used to reference the plottable data
from NXdata
when the data is provided in other groups
such as NXmonitor or
NXdetector.
In fact, a NeXus file can be viewed as a computer file system. Just as files are stored in folders (or subdirectories) to make them easy to locate, so NeXus fields are stored in groups. The group hierarchy is designed to make it easy to navigate a NeXus file.
The following diagram shows an example of a NeXus file represented as a tree structure.
Note that each field is identified by a name, such as counts,
but each group is identified both by a name and, after a colon as a
delimiter, the class
type, e.g., monitor:NXmonitor).
The class types, which all begin with
NX, define the sort of fields that the group should contain, in this
case, counts from a beamline monitor. The hierarchical design, with data
items nested in groups, makes it easy to identify information if you are
browsing through a file.
Here are some of the important classes found in nearly all NeXus files. A complete list can be found in the NeXus Design chapter.
Note that NXentry and NXdata
are the only two classes necessary to store the minimum
amount of information in a valid NeXus data file.
Required:
The top level of any NeXus file contains one or more
groups with the class NXentry. These contain all the data that is required to
describe an experimental run or scan. Each
NXentry typically contains a number of
groups describing sample information (class
NXsample), instrument details (class
NXinstrument), and monitor counts (class
NXmonitor).
Required:
Each NXentry group contains one or more
groups with class NXdata. These groups contain the experimental results
in a self-contained way, i.e., it should be possible to
generate a sensible plot of the data
from the information
contained in each NXdata group. That means it
should contain the axis labels and titles as well as the
data.
A NXentry group will often contain a group
with class NXsample. This group contains information pertaining to
the sample, such as its chemical composition, mass, and
environment variables (temperature, pressure, magnetic
field, etc.).
There might also be a group with class
NXinstrument. This is designed to encapsulate all the
instrumental information that might be relevant to a
measurement, such as flight paths, collimations, chopper
frequencies, etc.
Since an instrument can comprise several beamline components each
defined by several parameters, they are each specified by a separate group.
This hides the complexity from generic file browsers, but makes the
information available in an intuitively obvious way if it is required.
NeXus data files do not need to be complicated. In fact, the following diagram shows an extremely simple NeXus file (in fact, the simple example shows the minimum information necessary for a NeXus data file) that could be used to transfer data between programs. (Later in this section, we show how to write and read this simple example.)
This illustrates the fact that the structure of NeXus files is
extremely flexible. It can accommodate very complex instrumental
information, if required, but it can also be used to store very simple data
sets. In the next example, a NeXus data file is shown as XML:
Example 1.1. verysimple.xml: A very simple NeXus Data file (in XML)
<?xml version="1.0" encoding="UTF-8"?>
<NXroot NeXus_version="4.3.0" XML_version="mxml"
file_name="verysimple.xml"
xmlns="http://definition.nexusformat.org/schema/3.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://definition.nexusformat.org/schema/3.1
http://definition.nexusformat.org/schema/3.1/BASE.xsd"
file_time="2010-11-12T12:40:17-06:00">
<NXentry name="entry">
<NXdata name="data">
<counts NAPItype="NX_INT64[15]" long_name="photodiode counts" signal="NX_INT32:1" axes="two_theta">
1193 4474
53220 274310
515430 827880
1227100 1434640
1330280 1037070
598720 316460
56677 1000
1000
</counts>
<two_theta NAPItype="NX_FLOAT64[15]" units="degrees" long_name="two_theta (degrees)">
18.90940 18.90960 18.90980 18.91000
18.91020 18.91040 18.91060 18.91080
18.91100 18.91120 18.91140 18.91160
18.91180 18.91200 18.91220
</two_theta>
</NXdata>
</NXentry>
</NXroot>
NeXus files are easy to create. This example NeXus file was created using a short Python program and NeXpy:
Example 1.2. verysimple.py: Using NeXpy to write a very simple NeXus Data file (in HDF5)
#
# This example uses NeXpy to build the verysimple.nx5 data file.
from nexpy.api import nexus
angle = [18.9094, 18.9096, 18.9098, 18.91, 18.9102,
18.9104, 18.9106, 18.9108, 18.911, 18.9112,
18.9114, 18.9116, 18.9118, 18.912, 18.9122]
diode = [1193, 4474, 53220, 274310, 515430, 827880,
1227100, 1434640, 1330280, 1037070, 598720,
316460, 56677, 1000, 1000]
two_theta = nexus.SDS(angle, name="two_theta",
units="degrees",
long_name="two_theta (degrees)")
counts = nexus.SDS(diode, name="counts", long_name="photodiode counts")
data = nexus.NXdata(counts,[two_theta])
data.nxsave("verysimple.nx5")
# The verysimple.xml file was built with this command:
# nxconvert -x verysimple.nx5 verysimple.xml
# and then hand-edited (line breaks) for display.
If the design principles are followed, it will be easy for anyone browsing a NeXus file to understand what it contains, without any prior information. However, if you are writing specialized visualization or analysis software, you will need to know precisely what specific information is contained in advance. For that reason, NeXus provides a way of defining the format for particular instrument types, such as time-of-flight small angle neutron scattering. This requires some agreement by the relevant communities, but enables the development of much more portable software.
The set of data storage objects is divided into three parts: base classes, application definitions, and contributed definitions. The base classes represent a set of components that define the dictionary of all possible terms to be used with that component. The application definitions specify the minimum required information to satisfy a particular scientific or data analysis software interest. The contributed definitions have been submitted by the scientific community for incubation before they are adopted by the NIAC or for availability to the community.
These instrument definitions are formalized as XML files, using NXDL, (as described in the NXDL chapter in Volume II of this documentation) to specify the names of data fields, and other NeXus data objects. The following is an example of such a file for the simple NeXus file shown above.
Example 1.3. verysimple.nxdl.xml: A very simple NeXus Definition Language (NXDL) file
<?xml version="1.0" ?>
<definition
xmlns="http://definition.nexusformat.org/nxdl/3.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://definition.nexusformat.org/nxdl/3.1 ../nxdl.xsd"
category="base"
name="verysimple"
version="1.0"
svnid="$Id: verysimple.nxdl.xml 730 2010-11-12 18:40:01Z Pete Jemian $"
type="group" extends="NXobject">
<doc>
A very simple NeXus NXDL file
</doc>
<group type="NXentry">
<group type="NXdata">
<field name="counts" type="NX_INT" units="NX_UNITLESS">
<doc>counts recorded by detector</doc>
</field>
<field name="two_theta" type="NX_FLOAT" units="NX_ANGLE">
<doc>rotation angle of detector arm</doc>
</field>
</group>
</group>
</definition>
For complete examples of reading and writing NeXus data files, refer to
the Examples of reading or writing NeXus data files chapter in Volume II.
This chapter has several examples of writing and reading NeXus data files.
If you want to define the format of a particular type of NeXus file
for your own use, e.g. as the standard output from a program, you are encouraged
to publish the format using this XML format.
An example of how to do this is shown in the section titled
Creating a NXDL Specification.
NeXus data files are high-level so the user only needs to know how the data are referenced in the file but does not need to be concerned where the data are stored in the file. Thus, the data are most easily accessed using a subroutine library tuned to the specifics of the data format.
In the past, a data format was defined by a document describing the precise location of every item in the data file, either as row and column numbers in an ASCII file, or as record and byte numbers in a binary file. It is the job of the subroutine library to retrieve the data. This subroutine library is commonly called an application-programmer interface or API.
For example, in NeXus, a program to read in the wavelength of an experiment would contain lines similar to the following:
In this example, the program requests the value of the data that has
the label wavelength, storing the result in the variable lambda.
fileID is a file identifier that is provided by NeXus when the
file is opened.
We shall provide a more complete example when we have discussed the contents of the NeXus files.
NeXus began as a group of scientists with the goal of defining a common data storage format to exchange experimental results and to exchange ideas about how to analyze them.
The NeXus Scientific Community provides the scientific data, advice, and continued involvement with the NeXus standard. NeXus provides a forum for the scientific community to exchange ideas in data storage through the NeXus wiki.
The NeXus International Advisory Committee supervises the development and maintenance of the NeXus common data format for neutron, x-ray, and muon science. The NIAC: The NeXus International Advisory Committee supervises a technical committee to oversee the NeXus Application Programmer Interface (NAPI: The NeXus Application Programming Interface) and the NeXus class definitions.