The BPMAP
file contains information relating to the design of the Affymetrix tiling
arrays.Version 2 added the ability to a version, group and parameters
associated with each sequence item.
Version 3 added the ability to store perfect match probes in addition
to probe pairs.
The format of the
BPMAP
file is a binary file with data stored in big-endian format. The following lists
the sections and their order and placement in the file. The definition
of each section is detailed below.
File Header Sequence Description for sequence #1 Sequence
Description for sequence #2 ...
Sequence Description for sequence #N Sequence Header for sequence #1
Position Information for probe/probe pair #1 of sequence #1
Position Information for probe/probe pair #2 of sequence #1
...
Position Information for probe/probe pair #M of sequence #1
Sequence Header for sequence #2
Position Information for probe/probe pair #1 of sequence #2
Position Information for probe/probe pair #2 of sequence #2
...
Position Information for probe/probe pair #M of sequence #2 ...
Assuming there are N sequences and M_i probe pairs for sequence i.
| Item |
Description | Type |
Size | | 1 |
Magic number. A value to identify the file type.
The value is set to 'PHT7\r\n\032\n' |
char | 8 bytes |
| 2 | The version
number of the file. The version number is either 1.0, 2.0 or 3.0. Due to a bug
with the BPMAP file writer for early access arrays, this value may not be stored as a big endian
float. To read this value:
When on a big endian machine: read 4 bytes, swap the direction of the
bytes, cast this to an integer, swap the bytes and cast to a float. When on a little endian machine: read 4 bytes, cast the value as an
integer, swap by bytes and cast to a float.
| float |
4 bytes |
| 3 | Number of
sequences stored in the file. | unsigned int |
4 bytes |
| Item |
Description | Type |
Size | | 1 |
Length of the sequence name. |
unsigned int | 4 bytes |
| 2 | Sequence name. |
char | Specified
by item #1. |
| 3 |
Probe mapping type. (only for version 3.0 and above files) 0 indicates a
(PM/MM) probe
pair tiling across the sequence.
1 indicates a PM-only tiling across the sequence. |
unsigned int | 4
bytes |
| 4 |
Sequence file offset. (only for version 3.0 and above files)
The offset (in bytes), from the beginning of the file, of the probe position information. This is intended to enable fast look-up
ability. |
unsigned int | 4
bytes |
| 5 | Number of
probes/probe pairs in the sequence. | unsigned
int |
4 bytes |
| 6 | Length of the
group name (only for version 2.0 and above files) |
unsigned int |
4 bytes |
| 7 | Group name
(only for version 2.0 and above files) | char |
Specified by item #4. |
| 8 | Length of the
version number (only for version 2.0 and above files) |
unsigned int |
4 bytes |
| 9 | Version
number (only for version 2.0 and above files) |
char |
Specified by item #6 |
| 10 | Number of
parameters (only for version 2.0 and above files) |
unsigned int |
4 bytes |
| 11 | Parameters
name/value. The number of parameters is specified by item #8. (only for
version 2.0 and above files). Each parameter is defined as a pair of
name/value strings where the strings are stored as the following: unsigned
int (4 bytes) - This is the length of string.
char (# characters defined by the length of the string) - This is the
name of the string. | see the description. |
see the description. |
| Item |
Description | Type |
Size | | 1 |
Sequence ID |
unsigned int | 4 bytes |
| Item |
Description | Type |
Size | | 1 |
X coordinate on array of the perfect match (PM) probe
(note: array coordinates are 0 based). |
unsigned int | 4
bytes | | 2 |
Y coordinate on array of the PM probe |
unsigned int | 4
bytes |
| 3 | X coordinate
on array of the mismatch probe (MM) probe (only if the probe mapping type
indicates PM/MM tiling) | unsigned int | 4 bytes |
| 4 | Y coordinate
on array of the MM probe (only if the probe mapping type indicates PM/MM
tiling) | unsigned int |
4 bytes |
| 5 | Length of the
PM probe (and MM if a pair). | unsigned char |
1 byte |
| 6 | Probe
sequence. The 25 base probe sequence is packed into a 7 byte character
sequence. Each byte represents up to 4 bases (so the format can handle
probes of length up to 25bp).
The first byte contains the first 4 bases of the probe.
The first base of the probe is encoded in the two most significant bits of
the first byte.
The fourth base of the probe is encoded in the two least significant bits of
the first byte.
The conversion from each pair of bits to a DNA base is as follows: (0,1,2,3)
-> (A,C,G,T) | char |
7 bytes |
| 7 | Match score.
Note: The current BPMAP files are based on perfect match so the scores are
1. See the bug description in the version number field above. |
float | 4 bytes |
| 8 | Position of
the PM probe within the sequence. Note: The position is the 0-based position
of the lower coordinate of the 25-mer aligned to the target. |
unsigned int | 4 bytes |
| 9 | 1 if the
matching target (not the probe) is on the forward strand, 0 if on the
reverse. |
unsigned char | 1
byte |
|