AGCoL SyMAP Collinear UA
SyMAP Home | Download | Docs | Input | System Guide | User Guide | Tour

Details of Collinear sets

Using the Hit Filter, or right click in the hit region, set the highlighting for Collinear Sets. As shown below, the number of highlighted collinear hits and number of collinear sets will be shown in the Information box.
Collinear Sets

In the image on the right, two different sets are shown (green highlighted hits is the 1st set, pink is the 2nd). They were broken up by a gene on each chromosome that do not have a hit.

Forward (=) vs reverse (!=) sets:
a = hit is to the strands (+/+) or (-/-),
a != hit is to the strands (+/-) or (-/+).
All hits in a set are either to the same strand (=) or opposite (!=).

In the image on the right, all genes in both sets are on the same strand.

In the image below, all genes are on opposite strands.
Reverse hits

 

The collinear set algorithm does not consider the amount of overlap of the hit to a gene, or the similarity of the hit sequences.

2 Collinear Sets

The following are examples of interpreting the 2D view:

Sometimes a hit looks like it overlaps a gene, but the gene is actually in the gap, so it is not hit. This is illustrated on the right, where the 3 genes at the bottom appear hit by the red hit wire, but in fact are spanned by the gap gray area. Note that the hit popup shows it is a g0 hit (hits no genes). Clustered hit
  
The image on the right shows one collinear set with 6 brown hit wires that are not part of the set. The following explains them, using the symbols g0 (hits 0 genes), g1 (hits 1 gene), g2 (hits paired genes, at least one on each chromosome).

1g1Ignored, not part of the set since it does not hit paired genes.
2g0Ignored.
3g1Ignored since the same gene has a hit to a paired gene.
4g0Ignored.
5g1This ends the set since it does not hit a paired gene.
6g2This could start a new set, but is not followed by another paired gene.

 

As shown earlier, sometimes two genes look as one, so the Gene Delimiter option is important to view these.
Collinear Highlight and Gene Lines

G1 hits

Release v5.5.6

The collinear algorithm was improved for release v5.5.6.
Pre-v5.5.6 gave obviously wrong results if a gene from the hit numbered project (the hits are numbered along the alphabetically lesser project name) was part of two different collinear sets. An example is on the right.

In my test sets, this happened a few times between Arabidopsis, Brassica rapa, and cabbage, but none for my partial mammalian test set of humans, chimps, mouse and rabbit.

Overlapping collinear sets

Referring to the image on the right (from the demo), pre-v5.5.6 called this one collinear set, ignoring the extra embedded gene 4941.b; v5.5.6 breaks it into two sets due to gene 4941.b (as shown in the image). Overlapping collinear sets
V5.5.6 finds more collinear sets and longer ones. However, since there may be some hit-gene configuration that the algorithm did not take into account, it verifies all sets and prints an "Error in set x" to the terminal if there is a problem; albeit, it cannot verify that none are missed (but detailed inspection has not shown any missed ones). Please email Cari at symap@agcol.arizona.edu if you find any problems.

Go to Top

Email Comments To: symap@agcol.arizona.edu