The Command Console generic data file
format is a file format developed by Affymetrix for storing a variety of
Affymetrix data and results including scanner acquisition data and intensity and
probe array analysis results. Unlike previous Affymetrix files which
stores only one type of information, this file is designed to store
multiple data types where the contents of the file are self
describing within its header.This format was also developed to
support localization. Strings stored within the file are stored as 2
byte UNICODE characters.
Another design criteria with the file is to be able to uniquely
identify a file and its parentage independent of the file name. This was
accomplished by the use of unique identifiers that are part of the file
header.
The format of the file is a
binary file with data stored in network byte order (big-endian format).
The file is divided into "file header",
"generic data header" and
"data"
sections. Each section is described below.
The order of the sections are described as:
File Header
Generic Data Header (for the file)
Generic Data Header
(for the files 1st parent)
Generic Data Header (for the files 1st
parents
1st parent)
Generic Data Header
(for the files 1st parents 2nd parent)
...
Generic Data Header
(for the files 1st parents Mth parent)
(assuming there are M
parents for the files 1st parent)
Generic Data Header (for the files 2nd parent)
...
Generic Data Header (for the files
Nth parent)
(assuming the file
was created from N parent files)
Data Group #1
Data Set #1
Parameters
Column
definitions
Matrix of data
Data Set #2
...
Data Set #L
(assuming there
are L data sets in the group)
Data Group #2
...
Data Group #K
(assuming there are K
groups in the file)
The following table defines
the data types used in the file format description below.
Type |
Description |
BYTE |
8 bit signed integral number |
UBYTE |
8 bit unsigned integral number |
SHORT |
16 bit signed integral number |
USHORT |
16 bit unsigned integral number |
INT |
32 bit signed integral number |
UINT |
32 bit unsigned integral number |
FLOAT |
32 bit signed floating point number |
DOUBLE |
64 bit signed floating point number |
GUID |
STRING (see below) |
[ ] |
Indicates array of data |
DATETIME |
ISO 8601 date time in WSTRING format based on Universal Time
Clock UTC (UTC is also known as GMT, or Greenwich Mean Time)
E.g. "2005-11-23T13:45:53Z" |
LOCALE |
ISO 639 (two-character code) and ISO 3166 (two-character
code). Use only the language part of the
specification. |
PARAMETER |
BYTE (value type) / INT (size) / value object (depending on
the data type and size). |
STRING |
A 1 byte character string. A string object is stored as an INT (to store
the string length) followed by the CHAR array (to store the string
contents). |
WSTRING |
A UNICODE string. A string object is stored as an INT (to store
the string length) followed by the WCHAR array (to store the string
contents). |
WCHAR |
2 byte character. |
CHAR |
1 byte character. |
CONTROLLEDLIST |
An array of WSTRING's. |
TYPE |
A MIME type stored in a WSTRING. The possible MIME
types used are:
- text/x-calvin-integer-8
- text/x-calvin-unsigned-integer-8
- text/x-calvin-integer-16
- text/x-calvin-unsigned-integer-16
- text/x-calvin-integer-32
- text/x-calvin-unsigned-integer-32
- text/x-calvin-float
- text/plain
|
VALUE |
A MIME encoded strings stored in a STRING. |
ROW |
An array of data type values that make up a data set
row. The data types in a row is defined in the data
set header. |
The
following table defines the numeric values for the value types. The
value type is used to representing the type of value stored in the file.
Value |
Type |
0 |
BYTE |
1 |
UBYTE |
2 |
SHORT |
3 |
USHORT |
4 |
INT |
5 |
UINT |
6 |
FLOAT |
7 |
STRING |
8 |
WSTRING |
The file header section is the first section of the file. This section
is used to identify the type of file (i.e. Command Console data file), its version
number (for the file format) and the number of data
groups stored within the file. Information about the contents of the
file such as the data type identifier, the parameters used to create the file and its parentage is stored
within the generic data header
section.
Item |
Description |
Type |
1 |
Magic number. A value to identify that this is a Command
Console data
file. The value will be fixed to 59. |
UBYTE |
2 |
The version number of the file. This is the version of the
file format. It is currently fixed to 1. |
UBYTE |
3 |
The number of data groups. |
INT |
4 |
File position of the first data group. |
UINT |
Following this section in the file is the generic data header
section.
This section stores the file and file type identifiers, data to describe
the contents of the file, parameters on how it was created and
information about its parentage. This section contains a circular
dependency so as to traverse across the entire parentage of a file. This
information will provide the entire history of how a file came to be.
The first data header section immediately follows the
file header section.
Item |
Description |
Type |
1 |
The data type identifier. This is used to identify the type of data
stored in the file. For example:
- acquisition data (affymetrix-calvin-scan-acquisition)
- intensity data (affymetrix-calvin-intensity)
- expression results generated by MAS5 (affymetrix-probeset-analysis)
- expression results generated by RMA or PLIER (affymetrix-quantification-analysis)
- expression results generated by RMA or PLIER with DABG (affymetrix-quantification-detection-analysis)
- genotyping, copy number, copy number variation, DMET results (affymetrix-multi-data-type-analysis)
| STRING |
2 |
Unique file identifier. This is the identifier to use to
link the file with parent files. This identifier will be updated whenever
the contents of the file change. Example: When a user
manually aligns the grid in a DAT file the grid coordinates are
updated in the DAT file and the file is given a new file
identifier. |
GUID |
3 |
Date and time of file creation. |
DATETIME |
4 |
The locale of the operating system that the file was created
on. |
LOCALE |
5 |
The number of name/type/value parameters. |
INT |
6 |
Array of parameters stored as name/value/type triplets. |
(WSTRING / VALUE / TYPE ) [ ]
|
7 |
Number of parent file headers. |
INT |
8 |
Array of parent file headers. |
Generic Data Header [ ] |
This
section describes the data group. A data group is a group of data sets.
The file supports one or more data groups in a file.
Item |
Description |
Type |
1 |
File position of the next data group. When this is the last data group
in the file, the value should be 0. |
UINT |
2 |
File position of the first data set within the data group. |
UINT |
3 |
The number of data sets within the data group. |
INT |
4 |
The data group name. |
WSTRING |
This
section describes the data for a single data set item (probe set,
sequence, allele, etc.). The file supports one or more data sets within
a data group.
Item |
Description |
Type |
1 |
The file position of the first data element in the data set. This is
the first byte after the data set header. |
UINT |
2 |
The file position of the next data set within the data group. When
this is the last data set in the data group the value shall be 1 byte
past the end of the data set. This way the size of the data set may be
determined. |
UINT |
3 |
The data set name. |
WSTRING |
4 |
The number of name/value/type parameters. |
INT |
5 |
Array of name/value/type parameters. |
(WSTRING / VALUE / TYPE) [ ] |
6 |
Number of columns in the data set.
Example: For expression arrays, columns may include signal, p-value,
detection call and for genotyping arrays columns may include allele
call, and confidence value. For universal arrays, columns may include
probe set intensities and background. |
UINT |
7 |
An array of column names, column value types and column type sizes (one per column).
The value type shall be represented by the value from the value type
table. The size shall be the size of the type in bytes. For strings,
this value shall be the size of the string in bytes plus 4 bytes for the
string length written before the string in the file. |
(WSTRING / BYTE / INT) [ ] |
8 |
The number of rows in the data set. |
UINT |
9 |
The data set table, consisting of rows of columns (data values). The
specific type and size of each column is described by the data and size
types above. |
ROW [ ] |
Affymetrix GUIDs are universal unique identifiers (UUIDs) used to identify files and retain relationships between files.
For example, "lineage GUIDS" are used to establish parent-child relationships between files. "Execution GUIDs" are used to identify CHP
files generated during the same analysis run.
To allow flexibity with our software, Affymetrix does not require GUIDs to be compliant with an established format
such as RFC 4122. It is the responsility of the users of our software to ensure that UUIDs are unique.
|