The TPMAP (text probe map)
Version 3 file format is used to store the relationship between (PM,MM)
probe pairs or PM probes (PM only array) and positions on a set of
sequences of interest.Each sequence of interest must be identified by a "Sequence Group Name"
(e.g., Species or organism), "Version" (e.g., build of genome database)
and "Target sequence name" (e.g., chromosome).
The format is plain text.
Probes in the same Group Name, Version and Target sequences occur in
blocks with all entries having the same target sequence name. The first
such block must be headed by two lines giving Group Name and Version
(any successive blocks without these pairs will inherit them):
#seq_group_name group_name
#version version
and then optional tag/value pairs (see below). Fields are whitespace
delimited and fields after the first two can be present and will be
ignored.
There is a row corresponding to every instance of a (PM,MM) probe pair
or a PM probe aligning to a target sequence. Probe sequence is the
synthesis sequence of the probe on the chip in 5'-3' orientation. It is
the reverse complement of the target sequence.
Each row consists of the following 6-9 whitespace-separated entries:
- Probe sequence (must be in [acgtACGT], length must be in [1,27],
all probes must have same length).
- Alignment strand indicator: 1 or t or T or + if target (not
probe!) is on top strand, 0 or f or F or - otherwise
- Target sequence name
- Alignment position in target sequence (0-based, lower coordinate
of alignment)
- X coordinate of PM probe in array (0-based)
- Y coordinate of PM probe in array (0-based)
For a PM Only sequence block:
- Optional Match Score - float between 0.0 and 1.0
For a PM and MM sequence block:
- X coordinate of MM probe in array (0-based)
- Y coordinate of MM probe in array (0-based)
- Optional Match Score - float between 0.0 and 1.0
Mixtures of PM only probes and PM/MM probe pairs are not supported
within a sequence block
Tag/value support is added of the form:
#tag value
where tag and value are strings. The string naming the tag will not
include the leading '#'.
Any lines with '#' alone as the first non-whitespace string are
treated as comments.
If the Match Score is absent then it will be set to 1.0.
Below is a sample tpmap file:
#seq_group_name HS1 (Required Line, whitespace delimited using
default std::cin, only first 2 fields used). #version 11_Nov_2005 (Required Line, as above) #tag1 value1 (Optional tag/value line, white space delimited,
leading # stripped from tagname) #tag2 value2 (Another example of a tag/value optional line)
# this line is ignored because of white space after the #, empty
lines are ignored too
GCCCTGTTGTCTCTTACCCGGATGT f chr2 28 1112 2013 1112 2014 AATAGCCCTCATGTACGTCTCCTCC f chr2 1 1290 1449 1290 1450 AATAGCCCTCATGTACGTCTCCTCC f chr2 1 1291 1449 1291 1450 AATAGCCCTCATGTACGTCTCCTCC f chr2 1 1292 1449 1292 1450 GGAGGAGACGTACATGAGGGCTATT t chr2 1 1466 949 1466 950
# this sequence block inherits the group_name and version and other
tags GTAATGGAGGGTAAGTTGAGAGACA t chr1 107 1729 1497 1729 1498 GGTAATGGAGGGTAAGTTGAGAGAC t chr1 108 509 1397 509 1398 TAGGGCTGTGTTAGGGTAGTGTTAG t chr1 64 1745 1095 1745 1096 GGTAATGGAGGGTAAGTTGAGAGAC t chr1 108 510 1397 510 1398 CACTACCCTAACACAGCCCTAATCT f chr1 68 991 1953 991 1954 GGTTAGATTAGGGCTGTGTTAGGGT t chr1 72 295 1987 295 1988 GTCTCTCAACTTACCCTCCATTACC f chr1 108 355 1437 355 1438 GTCTCTCAACTTACCCTCCATTACC F chr1 108 354 1437 354 1438
# a PMOnly block with a match score in line 2 GTAGAGAGATGGATGGTGGTTGGGA t chr3 474 2305 1565 TAAGTAGAGAGATGGATGGTGGTTG 1 chr3 477 1297 981 .9 TAAGTAGAGAGATGGATGGTGGTTG t chr3 477 1298 981 AGTAAGTAGAGAGATGGATGGTGGT t chr3 479 631 779 TAGTAAGTAGAGAGATGGATGGTGG t chr3 480 843 1211 TAGTAAGTAGAGAGATGGATGGTGG t chr3 480 844 1211 TAGTAAGTAGAGAGATGGATGGTGG t chr3 480 845 1211 TAGTAAGTAGAGAGATGGATGGTGG t chr3 480 846 1211 GTAGTAAGTAGAGAGATGGATGGTG t chr3 481 1397 1011 GTTGGTGGTAGTAAGTAGAGAGATG t chr3 488 2520 1035 GACGGTGGGTTGGTGGTAGTAAGTA t chr3 496 1000 2453
#seq_group_name B_subtilis #version 1/1/00:11am #tag1 controls # has a match score in first line CACACCCTAACACTACCCTAACACT 0 chr1 356 1069 1943 3069 1944 0.8 ACCCTAACACTACCCTAACACTACC f chr1 359 2059 2371 2059 2372 GGGTAGTGTTAGGGTAGTGTTAGGG t chr1 360 1704 2249 1704 2250 TAGGGTAGTGTTAGGGTAGTGTTAG T chr1 2012000111 1996 2209 1996 2210
# ###### Explanation of the sequence blocks: # There is a row corresponding to every instance of a (PM,MM) probe
pair or a PM probe aligning # to a target sequence. Each row consists of the following 6-9
whitespace-separated entries:
# 1 Probe sequence (must be in [acgtACGT], length must be in [1,27],
all probes must have same length) # 2 Alignment strand indicator: 1 or t or T if target (not probe!)
is on top strand, 0 or f or F otherwise # 3 Target sequence name # 4 Alignment position in target sequence (0-based, lower coordinate
of alignment) # 5 X coordinate of PM probe in array (0-based) # 6 Y coordinate of PM probe in array (0-based) # For a PM Only sequence block: # 7 Optional Match Score - float between 0.0 and 1.0, represents how
well seq matches target # For a PM and MM sequence block: # 7 X coordinate of MM probe in array (0-based) # 8 Y coordinate of MM probe in array (0-based) # 9 Optional Match Score - float between 0.0 and 1.0 #
# Mixtures of PM only probes and PM/MM probe pairs are not supported
within a sequence block
|