Affymetrix BPMAP File Format

TPMAP FILE

Description
The TPMAP (text probe map) Version 3 file format is used to store the relationship between (PM,MM) probe pairs or PM probes (PM only array) and positions on a set of sequences of interest.

Each sequence of interest must be identified by a "Sequence Group Name" (e.g., Species or organism), "Version" (e.g., build of genome database) and "Target sequence name" (e.g., chromosome).

Format
The format is plain text.

Probes in the same Group Name, Version and Target sequences occur in blocks with all entries having the same target sequence name. The first such block must be headed by two lines giving Group Name and Version (any successive blocks without these pairs will inherit them):

#seq_group_name group_name
#version version

and then optional tag/value pairs (see below). Fields are whitespace delimited and fields after the first two can be present and will be ignored.

There is a row corresponding to every instance of a (PM,MM) probe pair or a PM probe aligning to a target sequence. Probe sequence is the synthesis sequence of the probe on the chip in 5'-3' orientation. It is the reverse complement of the target sequence.

Each row consists of the following 6-9 whitespace-separated entries:

Probe sequence (must be in [acgtACGT], length must be in [1,27], all probes must have same length).
Alignment strand indicator: 1 or t or T or + if target (not probe!) is on top strand, 0 or f or F or - otherwise
Target sequence name
Alignment position in target sequence (0-based, lower coordinate of alignment)
X coordinate of PM probe in array (0-based)
Y coordinate of PM probe in array (0-based)

For a PM Only sequence block:

Optional Match Score - float between 0.0 and 1.0

For a PM and MM sequence block:

X coordinate of MM probe in array (0-based)
Y coordinate of MM probe in array (0-based)
Optional Match Score - float between 0.0 and 1.0

Mixtures of PM only probes and PM/MM probe pairs are not supported within a sequence block

Tag/value support is added of the form:

#tag value

where tag and value are strings. The string naming the tag will not include the leading '#'. Any lines with '#' alone as the first non-whitespace string are treated as comments.

If the Match Score is absent then it will be set to 1.0.

Example
Below is a sample tpmap file:

#seq_group_name HS1 (Required Line, whitespace delimited using default std::cin, only first 2 fields used).
#version 11_Nov_2005 (Required Line, as above)
#tag1 value1 (Optional tag/value line, white space delimited, leading # stripped from tagname)
#tag2 value2 (Another example of a tag/value optional line)

# this line is ignored because of white space after the #, empty lines are ignored too

GCCCTGTTGTCTCTTACCCGGATGT f chr2 28 1112 2013 1112 2014
AATAGCCCTCATGTACGTCTCCTCC f chr2 1 1290 1449 1290 1450
AATAGCCCTCATGTACGTCTCCTCC f chr2 1 1291 1449 1291 1450
AATAGCCCTCATGTACGTCTCCTCC f chr2 1 1292 1449 1292 1450
GGAGGAGACGTACATGAGGGCTATT t chr2 1 1466 949 1466 950

# this sequence block inherits the group_name and version and other tags
GTAATGGAGGGTAAGTTGAGAGACA t chr1 107 1729 1497 1729 1498
GGTAATGGAGGGTAAGTTGAGAGAC t chr1 108 509 1397 509 1398
TAGGGCTGTGTTAGGGTAGTGTTAG t chr1 64 1745 1095 1745 1096
GGTAATGGAGGGTAAGTTGAGAGAC t chr1 108 510 1397 510 1398
CACTACCCTAACACAGCCCTAATCT f chr1 68 991 1953 991 1954
GGTTAGATTAGGGCTGTGTTAGGGT t chr1 72 295 1987 295 1988
GTCTCTCAACTTACCCTCCATTACC f chr1 108 355 1437 355 1438
GTCTCTCAACTTACCCTCCATTACC F chr1 108 354 1437 354 1438

# a PMOnly block with a match score in line 2
GTAGAGAGATGGATGGTGGTTGGGA t chr3 474 2305 1565
TAAGTAGAGAGATGGATGGTGGTTG 1 chr3 477 1297 981 .9
TAAGTAGAGAGATGGATGGTGGTTG t chr3 477 1298 981
AGTAAGTAGAGAGATGGATGGTGGT t chr3 479 631 779
TAGTAAGTAGAGAGATGGATGGTGG t chr3 480 843 1211
TAGTAAGTAGAGAGATGGATGGTGG t chr3 480 844 1211
TAGTAAGTAGAGAGATGGATGGTGG t chr3 480 845 1211
TAGTAAGTAGAGAGATGGATGGTGG t chr3 480 846 1211
GTAGTAAGTAGAGAGATGGATGGTG t chr3 481 1397 1011
GTTGGTGGTAGTAAGTAGAGAGATG t chr3 488 2520 1035
GACGGTGGGTTGGTGGTAGTAAGTA t chr3 496 1000 2453

#seq_group_name B_subtilis
#version 1/1/00:11am
#tag1 controls
# has a match score in first line
CACACCCTAACACTACCCTAACACT 0 chr1 356 1069 1943 3069 1944 0.8
ACCCTAACACTACCCTAACACTACC f chr1 359 2059 2371 2059 2372
GGGTAGTGTTAGGGTAGTGTTAGGG t chr1 360 1704 2249 1704 2250
TAGGGTAGTGTTAGGGTAGTGTTAG T chr1 2012000111 1996 2209 1996 2210

# ###### Explanation of the sequence blocks:
# There is a row corresponding to every instance of a (PM,MM) probe pair or a PM probe aligning
# to a target sequence. Each row consists of the following 6-9 whitespace-separated entries:

# 1 Probe sequence (must be in [acgtACGT], length must be in [1,27], all probes must have same length)
# 2 Alignment strand indicator: 1 or t or T if target (not probe!) is on top strand, 0 or f or F otherwise
# 3 Target sequence name
# 4 Alignment position in target sequence (0-based, lower coordinate of alignment)
# 5 X coordinate of PM probe in array (0-based)
# 6 Y coordinate of PM probe in array (0-based)
# For a PM Only sequence block:
# 7 Optional Match Score - float between 0.0 and 1.0, represents how well seq matches target
# For a PM and MM sequence block:
# 7 X coordinate of MM probe in array (0-based)
# 8 Y coordinate of MM probe in array (0-based)
# 9 Optional Match Score - float between 0.0 and 1.0
#
# Mixtures of PM only probes and PM/MM probe pairs are not supported within a sequence block