|
The CEL
file stores the results of the intensity calculations on the pixel values of
the DAT file. This includes an intensity value, standard deviation of the
intensity, the number of pixels used to calculate the intensity value, a
flag to indicate an outlier as calculated by the algorithm and a user
defined flag indicating the feature should be excluded from future analysis.
The file stores the previously stated data for each feature on the probe
array. The information below will describe the following versions:
- Version 3 is generated by the MAS software.
This was also known as the ASCII version.
- Version 4 is generated by the GCOS software.
This was also know as the binary or XDA version.
- Command Console version 1 is generated by the
Command Console software. This is stored in
the Command Console "generic" data file format.
The format of the CEL file is an ASCII text file similar to
the Windows INI format.
The file is divided up into sections. The start
of each section is defined by a line containing a section name enclosed in square
braces. The section names are: "CEL", "HEADER", "INTENSITY", "MASKS", "OUTLIERS"
and "MODIFIED". The data in each section is of the format TAG=VALUE. The
"CEL" section contains the version number of the file. The TAGS are:
| TAG | Description |
| Version | The
version number. Always set to 3. | The "HEADER" section contains
miscellaneous header information. The TAGS are:
| TAG | Description |
| Cols | The number
of columns in the array (of cells). | |
Rows | The number of rows in the array (of cells). |
| TotalX | Same
as Cols. | | TotalY | Same
as Rows. | | OffsetX | Not
used, always 0. | | OffsetY | Not
used, always 0. | | GridCornerUL | XY
coordinates of the upper left grid corner in pixel coordinates. |
| GridCornerUR | XY coordinates
of the upper right grid corner in pixel coordinates. | |
GridCornerLR | XY coordinates of the lower right
grid corner in pixel coordinates. | |
GridCornerLL | XY coordinates of the lower left
grid corner in pixel coordinates. | |
Axis-invertX | Not used, always 0. |
| AxisInvertY | Not
used, always 0. | | swapXY | Not
used, always 0. | | DatHeader | The
header from the DAT file. | | Algorithm | The
algorithm name used to create the CEL file. | |
AlgorithmParameters | The parameters used by the
algorithm. The format is TAG:VALUE pairs separated by semi-colons or TAG=VALUE
pairs separated by spaces. | The
"INTENSITY" section contains intensity information. The TAGS are:
| TAG | Description |
| NumberCells | The
total number of cells in the array (Rows*Cols) | |
CellHeader | The header for the remainder of the
data in this section. The header is always set to: "X
Y MEAN STDV NPIXELS" | | NA |
The remaining lines in this section contain the
intensity, standard deviation value and the number of pixels used to compute the
intensity value for each cell in the array. The order is defined by the header. |
The "MASKS" section specifies which cells have been masked by the user.
The TAGS are:
| TAG | Description |
| NumberCells | The
number of masked cells. | | CellHeader | The
header for the remainder of the data in this section. The header is always set
to: "X Y". | | NA | The
remaining lines in this section contain the XY coordinates of those cells masked
by the user. | The "OUTLIERS" section specifies which cells
were called outliers by the software. The TAGS are:
| TAG | Description |
| NumberCells | The
number of outlier cells. | | CellHeader | The
header for the remainder of the data in this section. The header is always set
to: "X Y". | | NA | The
remaining lines in this section contain the XY coordinates of those cells called
outliers by the software. | The "MODIFIED" section specifies
which cells were modified by the user. This feature was dropped in MAS 4 thus
the number of cells in this section should always be 0. The TAGS are:
| TAG | Description |
| NumberCells | The
number of outlier cells. | | CellHeader | The
header for the remainder of the data in this section. The header is always set
to: "X Y ORIGMEAN". | | NA | The
remaining lines in this section contain the XY coordinates and the original intensity
value (calculated by the software) of those cells modified by the user. |
The format of the
CEL file is an binary file were values are stored in little-endian format.
The
file contents are define by:
| Item | Description | Type |
| 1 | Magic number.
Always set to 64. | integer | | 2 |
Version number. Always set to 4. | integer |
| 3 | Number of
columns. | integer | |
4 | Number of rows. |
integer | | 5 | Number
of cells (rows*cols). | integer |
| 6 | Header length | integer |
| 7 | Header as
defined in the HEADER section of the version 3 CEL files. The string contains
TAG=VALUE separated by a space where the TAG names are defined in the version
3 HEADER section. | char[ length defined above] |
| 8 | Algorithm
name length. | integer | |
9 | The algorithm name used to create the CEL file. | char[
length defined above] | | 10 | Algorithm
parameters length. | integer | |
11 | The parameters used by the algorithm. The format
is TAG:VALUE pairs separated by semi-colons or TAG=VALUE pairs separated by
spaces. |
char[ length defined above] | | 12 | Cell
margin used for computing the cells intensity value. |
integer | | 13 | Number
of outlier cells. | DWORD | |
14 | Number of masked cells. |
DWORD | | 15 | Number
of sub-grids. | integer | |
16 | Cell entries - this consists of an intensity
value, standard deviation value and pixel count for each cell in the array. The
values are stored by row then column starting with the X=0, Y=0 cell. As an example,
the first five entries are for cells defined by XY coordinates: (0,0), (1,0),
(2,0), (3,0), (4,0).< /p> | (float, float, short) |
| 17 | Masked entries
- this consists of the XY coordinates of those cells masked by the user. |
(short, short) | | 18 | Outlier
entries - this consists of the XY coordinates of those cells called outliers by
the software. | (short, short) | |
19 | Sub-grid entries - This is the sub-grid definition.
There are as many sub-grids in the file as defined by the number of sub-grids
above. Each sub-grid is defined as: - row number (integer) - column number
(integer) - upper left x coordinate in pixels (float) - upper left y coordinate
in pixels (float) - upper right x coordinate in pixels (float) - upper
right x coordinate in pixels (float) - lower left x coordinate in pixels (float)
- lower left y coordinate in pixels (float) - lower right x coordinate in
pixels (float) - lower right x coordinate in pixels (float) - left cell
position (integer) - top cell position (integer) - right cell position
(integer) - bottom cell position (integer) | (integer,
integer, float, float, float, float, float, float, float, float, integer , integer
, integer , integer ) | Types used are defined as: integer
(A 32-bit signed integer), DWORD (32-bit unsigned integer), float (An 32-bit floating-point
number), short (16-bit signed integer).
The format of the CEL file
generated by the Command Console software uses the Command Console
generic data format. The following describes the data sets and groups in
the file.
The generic data header shall include:
The data type identifier is set to "affymetrix-calvin-intensity"
The parameters are dependent on the algorithm used to create the CEL
file. For the percentile algorithm these include the following
parameters:
| Parameter Name |
Definition |
| affymetrix-algorithm-param-Percentile |
The percentile value used. |
affymetrix-algorithm-param-CellMargin |
The number of pixels around the border to ignore. |
affymetrix-algorithm-param-OutlierHigh |
The high threshold for the outlier calculation. |
affymetrix-algorithm-param-OutlierLow |
The low threshold for the outlier calculation. |
| affymetrix-algorithm-param-GridULX |
The X coordinate of the upper left corner of the global grid. |
| affymetrix-algorithm-param-GridULY |
The Y coordinate of the upper left corner of the global grid. |
| affymetrix-algorithm-param-GridURX |
The X coordinate of the upper right corner of the global grid. |
| affymetrix-algorithm-param-GridURX |
The Y coordinate of the upper right corner of the global grid. |
| affymetrix-algorithm-param-GridLRX |
The X coordinate of the lower right corner of the global grid. |
| affymetrix-algorithm-param-GridLRX |
The Y coordinate of the lower right corner of the global grid. |
| affymetrix-algorithm-param-GridLLX |
The X coordinate of the lower left corner of the global grid. |
| affymetrix-algorithm-param-GridLLX |
The Y coordinate of the lower left corner of the global grid. |
Other parameters include:
| Parameter Name |
Definition |
| affymetrix-array-type |
The probe array type |
| affymetrix-algorithm-name |
The name of the algorithm. |
| affymetrix-cel-cols |
The number of columns of features (cells) |
| affymetrix-cel-rows |
The number of rows of features (cells) |
| affymetrix-file-version |
File version. |
The DAT file parameters (if available) will be stored within the parent
data header object.
The intensity data is stored in a single group called Default Group with 5 data sets. The data
sets are defined as:
| Data Set Name |
Description |
Number of Columns |
Column Name |
Column Type |
Description |
| Intensity |
The intensity values for each feature. |
1 |
Intensity |
FLOAT |
The intensity value. The row order is the same as defined in the GCOS CEL file. |
| StdDev |
The standard deviations of the intensity values. |
1 |
StdDev |
FLOAT |
The standard deviation value. The row order is the same as defined in the GCOS CEL file. |
| Pixel |
The number of pixels used to calculate the intensity values. |
1 |
Pixel |
SHORT |
SHORT - The pixel count value. The row order is the same as defined in the GCOS CEL file. |
| Outlier |
The X/Y coordinates of those features called as outliers by the
algorithm. |
2 |
X Y |
SHORT SHORT |
The X coordinate of the outlier cell. The Y
coordinate of the outlier cell. |
| Mask |
The X/Y coordinates of the user masked features. |
2 |
X Y |
SHORT SHORT |
The X coordinate of the outlier cell. The Y
coordinate of the outlier cell. |
|