aguirreg

File Formats and Data I/O

During the course of conducting and analyzing fMRI experiments, several different types of data are created and used, ranging from simple vectors to four-dimensional time series of 3D images. VoxBo has specialized routines to manage the storage of all of these types of data. We will review here the types of data formats used in VoxBo and how they may be created, read and modified.

In general, data can be stored in either text or binary format. Text data (typically ASCII data) is the kind of stuff you can read by looking at it. It's easy to work with because you can view and modify the contents of a file using any text editor (e.g., emacs or vi) and you can use utilities like more to get a quick look at what's inside. A drawback of text, however, is that it is space inefficient. Four bytes in text format can be used to represent any positive integer from 0 to 9999. Four bytes in a typical binary format will get you anything from 0 to 232-1. If every byte, or 8 bits, represents a single digit, it's like throwing away about 4.68 of those bits. Another drawback is that it is burdensome to translate numbers from a code designed for people to the binary form used by the computer.

Binary data is more compact and is often the same format used by the computer internally. Or something close. But you can't just open it up in a text editor. You need some kind of special program that knows how the data are structured.

VoxBo file formats use both text and binary data. For most types of data (with the exception of the REF format, described below), VoxBo stores the data itself in binary format, but attaches a formatted, readable text header to the file. A header is the part of a file that contains information about the file including (but not limited to) the dimensions of the data, the data type (i.e., byte, integer, float, etc.) and a label for the data. Thus, using the more command or emacs, you will always be able to examine a VoxBo file to determine some basic information about the file by reading the header. A great deal of information is stored in the header by the various VoxBo programs. Additionally, you may add to the header using a text editor if you wish to attach notes or other information to the file. To do so safely (i.e., to avoid corrupting your file) you will need to know a little bit more about the structure of the text header.

Headers

The comments below apply to VoxBo data files that contain 2, 3 and 4 dimensional data. The header is composed of two parts: an initial portion that is created by the VoxBo I/O software and cannot be modified directly by the end user (termed the Fixed Header), and a latter portion (termed the User Header) that can be modified safely. The Header portion of the file is followed by a control-L character (or Linefeed) that separates the header from the data. A control-L is used because the more command stops output when it reaches the control-L, allowing the user to then hit “q” to exit prior to display of the binary portion of the file. The figure below presents a schematic outline of the VoxBo file format for an example file.

The Fixed Header begins with a Creator and Kind code. The Creator code for this release of the software is VB98. Any file that does not begin with the VB98 creator code will not be read by the VoxBo software. The Kind code is used by VoxBo identify the contents of the file. The five Kinds currently in use are listed in the table below.

Following the Creator and Kind codes, the Fixed Header always lists the DataType (i.e., integer, float, etc.) and the VoxDims (the number of elements in each dimension of the array). It is very important that these components of the Fixed Header never be edited by the user, as their modification will render the file unreadable by the VoxBo software.

The User Header, which begins immediately after the VoxDims line, can be modified by the user with relative impunity. It should be remembered, however, that the VoxBo programs store information within the User Header that is used to properly display and analyze the file. The user can modify the Header to control this behavior, but should do so with some knowledge of the impact of his/her actions. This being said, lines can be added to the User Header without fear if additional notation for the file is simply desired.

Kind	Dimensions	Header	Body	Data-type
TES1	4	text	binary	numbers
CUB1	3	text	binary	numbers
MAT1	2	text	binary	numbers
REF1	1	text	text	numbers
TXT1	1	text	text	strings

Specialized file types

There are a few aspects of the VoxBo formats that are specialized for the fMRI data that we typically encounter.

The TES file format

TES files store the image data acquired by the scanner. Two aspects of TES files are worthy of note. First, TES files store the data in a TIME, X, Y, Z array order. This is done so that all the time-series data for a given voxel are physically arranged on the hard-drive in a linear fashion. As a result, the analysis programs can load into memory just the time-series data from a given voxel of interest without needing to load the rest of the TES file into memory. This produces a considerable reduction in processing time. Second, a rudimentary form of data compression is applied to TES files. This compression is realized by only storing data for voxels that have non-zero values over the time-dimension. As a result, data from voxels located outside of the brain volume are not stored. This generally produces a 40-60% reduction in the size of the files.

A Mask is stored within the TES file to indicate the location of voxels that have been stored. It is located immediately after the text header and before the start of the time-series data. The mask is a three-dimensional array of byte values, matching the TES file in size in X, Y and Z. The mask is set to one at each coordinate at which time-series data are stored.

The REF and TXT file formats

Typically, the user will wish to view and edit the types of data that are stored as one dimensional arrays. These include Condition Functions (described under the Analysis section), subject lists, and time-series data saved from the various display routines. To facilitate this, VoxBo uses a format for one-dimensional data that is quite different from that used for data of higher dimensions. For one-dimensional data, both header information and the data itself are stored as formatted text. This makes it very easy to create and read VoxBo REF files using text editors (or programs like Excel).

The REF file format is used for data files that store lists of numbers while TXT files store lists of strings (text data). In both cases, each value in the list is separated by a carriage return. The headers on REF and TXT files are typically simpler than those found in other data types. All non-data information in REF and TXT files are indicated as such by a Comment Character. Acceptable comment characters are the semicolon (;) and the pound sign (#). These comment characters should begin any line that is not to be treated as data. The first two lines of all REF and TXT files provide the Creator and Kind codes required by VoxBo. These codes must also be preceded by a Comment Character. Thus, a typical REF file might look like this:

;VB98
;REF1
;
; This is the condition function for the Bart Letter Load experiment.
;   Code:    0:  ITI     1:  Stimulus      2: Delay       3: Probe
;

1
0
2
0
3
0
0

Directory organization

While not a file format issue per se, it is important to be familiar with the standard organization of data files in the VoxBo scheme. The basic unit of data addressed by the software is the TES file, which stores the data from a single scan. A scan is arbitrarily defined as a finite period of data collection and is practically realized as the data that is stored in a single raw (k-space) data file produced by the GE scanner (typically, 160-200 images). During the course of a scanning session, several scans may be conducted. After data conversion, each scan will be stored as a TES file within a folder labeled with the subject's name and numbered in order. The figure below illustrates the typical layout of a data directory. This directory illustrates what you might find after converting the data from a single subject (named “mysubject”) who was scanned three times.

The first thing to note is that there are several files associated with the TES file within the folder. These files are, in most cases, files of the REF format that contain values that correspond to the points in time represented by the TES file. For example, the _GS.ref file contains the global signal calculated for the TES file, which is the simple average of the all of the parenchymal voxel values at each point in time. Other files include the _PS.ref file (contains the time-domain representation of the grand-average power spectrum of the brain voxels in the TES file) and the _MoveParams.ref file (contains the X, Y, Z, Pitch, Roll and Yaw values for the imaged volume over time).

The Anatomy folder contains files that provide the structural images on which functional data are typically overlayed. Included in this directory is the locs directory, which contains the original, extracted images from the GE scanner. Typically, there is little need to directly view these files. Instead, the Anatomy folder contains a number of processed versions of these extracted images. In the example provided in the figure, a file called AxialT1s.cub contains the assembled set of axial slices acquired prior to functional scanning. Additionally, the EPIs.cub holds the “scout” EPI series that was acquired immediately prior to the distortion correction run. The EPIs.cub file has been itself distortion corrected, and typically serves as the target of realignment for motion correction of the TES files.

Finally, VoxBo creates a directory called Models. It is suggested that you store the analysis models that you create within this folder, along with, for example, fits to a subject's 1/f power spectrum.

Technical: IDL I/O access

The IDL code that controls file access for VoxBo can be found in the file VoxBo_IO.pro. Two core routines handle data writing and reading. There are several additional routines that allow for further access to VoxBo files. These routines are well commented within the VoxBo_IO.pro file.

WriteFile

WriteFile is used to transfer data from memory to the hard drive. It is a procedure with the following usage:

WriteFile, Data, [FileName], [UserHeader], [/TIFF]

Data is the name of a variable that contains the information to be written. It can be between one and four dimensions (including scalars), and of almost any data type. FileName is the full path to the file to be created. If not specified, the routine will prompt the user for a file name either through a pickfile graphical interface (if accessed through a graphics-capable terminal and the NOGUI keyword is not set) or through a command-line, text interface. UserHeader is a string variable that contains the text header to be placed after the fixed header and before the data itself. It typically consists of an array of strings, themselves composed of several tokens separated by tabs. It may be left undefined if no user header is desired. WriteFile also accepts the standard three keywords: NOGUI, QUIET, and ErrorFlag. The TIFF keyword can be set to indicate that the file should be saved in TIFF format. Note that an error will be returned if the data variable is not a two-dimensional array of byte values.

ReadFile

ReadFile is used to transfer information from the hard drive to memory. It is a function with the following usage:

MyData = ReadFile ( [FileName] )

FileName is the full path to the file to be read. If not specified, the routine will prompt the user for a file name either through a pickfile graphical interface (if accessed through a graphics-capable terminal and the NOGUI keyword is not set) or through a command-line, text interface. ReadFile also accepts the standard three keywords.

ReadHeader

ReadHeader is used to obtain information about the fixed and user headers of a VoxBo file. It is a function:

MyHeader = ReadHeader ( [FileName], [Mask] )

FileName is the full path to the file to be read. If not specified, the routine will prompt the user for a file name either through a pickfile graphical interface (if accessed through a graphics-capable terminal and the NOGUI keyword is not set) or through a command-line, text interface.

The Mask parameter can be set to any named variable. If the file to be read is a TES file, then the Mask variable will be set to the three-dimensional array of byte values that indicates the position of non-zero values within the TES file.

ReadFile also accepts the standard three keywords.

The function returns a structure with several named fields. The fields contain the “parsed” information from the fixed and user headers. If, for example, you wished to find out what information regarding the Origin of the volume is present in the header, you would type:

print, MyHeader.origin

Note that one of the named fields is .UserHeader. This field contains the unparsed user header from the file. This is useful if you wish to modify the data in some way, and then write the file back with the same (or perhaps modified) user header. A complete list of the fields present in the structure returned by ReadHeader can be obtained by issuing the IDL command:

help, MyHeader, /STRUCT

Shell access

When accessed in the text mode, ReadFile, WriteFile and ReadHeader have built-in shell access abilities. At the I/O prompt:

File to read:

or

File to write:

you may issue commands to the shell using the $ prefix. For example:

$ls
$cd ..

Technical: a note on endian-related issues

Because we're feeling expansive, a little background first. Numbers, as stored in computers, typically take anywhere from two to eight bytes of storage. A reasonable question to ask is what order the bytes come in. That is, do the most significant or the least significant bytes come first?

We tend to think about and write out numbers with the most significant byte first. In decimal numbers, 92 is larger than 29, even though both numbers are made out of the same two digits. In binary, 00001111 is smaller than 11110000. With bytes represented as pairs of hexadecimal characters, the short integer 9876 (two bytes, or 16 bits) is larger than 7698, because the first byte is more significant.

But computers aren't like people, and they could conceivably store numbers in either order. And unfortunately, not all computers store numbers the same way. To represent the sixteen-bit (two byte) integer we think of (in hexadecimal) as 9876, some machines store in consecutive addresses the bytes 98 and 76. This is known as big-endian architecture, and is what you'll find on Sun workstations and Macintoshes. Other computers, most notably Intel-based systems, store their numbers the other way. The number we think of as 9876 is actually stored with 76 in the first memory address and 98 in the next. This is little-endian architecture . (Note that we're not interested in the order of the 9 and the 8 – they are part of the same byte.)

This incompatibility is a headache for anyone interested in portability of data or software. In the most degenerate case, you could receive a file knowing only that it contains unsigned four-byte (long) integers and have two equally valid interpretations of its contents. There do exist legitimate file formats that are defined only as containing (for example) two-byte integers, which means that the format of the data in a given file depends on the machine used to create it.

Fortunately, there are a few remedies that can help work around this problem. Most obvious is byte-swapping, or simply reversing the order of the bytes in memory. If a file format is decreed to be big-endian, sofware running on little-endian computers will have to swap bytes before writing to (and after reading from) disk. There are simple ways to do this in most programming languages that are transparent to the user, and for our purposes, the time it takes to byte-swap is negligible.

What does this mean for VoxBo? Since we'd like VoxBo to run comfortably in mixed environments, we declare all VoxBo file formats to be big-endian, and we require all VoxBo components to byteswap when necessary. This means that even though our entire lab is composed of Intel-based GNU/Linux boxes, we have to byte-swap all our data whenever it's read or written. All VoxBo code that reads or writes binary files is required to do this conversion. This system offers a huge benefit in cross-platform compatibility, data-sharing, etc. And since the amount of time required to byte-swap even very large arrays of data is negligible (compared to the time it takes to do anything meaningful with the data), we don't feel badly about this. Of course, future extensions to our file formats may include an optional endianness flag in the header, so that we can maintain compatibility across platforms but still keep each file in a preferred byte order. However, we will leave big-endian as the default.

Lastly, this section would be incomplete without some mention of floating point formats. In principle, floating point formats are more complicated, and cannot easily be translated between architectures merely by byte-swapping. Fortunately, in the two architectures we've used (Intel and Sun SPARC), byte-swapping does the trick. That is, byte-order aside, they adhere to similar specifications for floating point representation. We got lucky. But to be more rigorous about things, the official VoxBo binary data format is that of the XDR standard, a specification from Sun that can easily be read or written on virtually any UNIX-like platform. It just so happens that this format is what you get if you do nothing on Suns or if you do byte-swapping on Linux machines. So we currently try to get by with byte-swapping, but if we hear about someone using a machine with incompatible floats, we will probably do the responsible thing and switch to using XDR functions instead. If you want to write compatible code that needs to do this conversion, look at the documentation for the BYTEORDER function in IDL and look at the man pages for htons and the xdr functions for c code. These functions are capable of reading data in VoxBo format and converting it to your machine's internal format, and vice-versa. That is, they do absolutely nothing on Suns, but they byte-swap on Intel machines. Now forget about byte order and read about the file formats themselves.