AGCoL SyMAP Details UA
SyMAP Home | Download | Docs | Input | System Guide | User Guide | Tour
Contents
    Collinear sets     Highlight conserved genes     Hit popup

Collinear sets

Using the Hit Filter, or right click in the hit region, set the highlighting for Collinear Sets. As shown below, the number of highlighted collinear hits and number of collinear sets will be shown in the Information box.
Collinear Sets

In the image on the right, two different sets are shown (green highlighted hits is the 1st set, pink is the 2nd). They were broken up by a gene on each chromosome that do not have a hit.

Forward (=) vs reverse (!=) sets:
a = hit is to the strands (+/+) or (-/-),
a != hit is to the strands (+/-) or (-/+).
All hits in a set are either to the same strand (=) or opposite (!=).

In the image on the right, all genes in both sets are on the same strand.

In the image below, all genes are on opposite strands.
Reverse hits

 

The collinear set algorithm does not consider the amount of overlap of the hit to a gene, or the similarity of the hit sequences.

2 Collinear Sets

The following are examples of interpreting the 2D view:

Sometimes a hit looks like it overlaps a gene, but the gene is actually in the gap, so it is not hit. This is illustrated on the right, where the gene at the bottom appears hit by the pink hit wire, but in fact are spanned by the gap gray area. Note that the hit popup shows it is a nn type, which hits no genes. Clustered hit
  
The image on the right shows one collinear set with 6 brown hit wires that are not part of the set. The following explains them, using the symbols g0 (hits 0 genes), g1 (hits 1 gene), g2 (hits paired genes, at least one on each chromosome).

1g1Ignored, not part of the set since it does not hit paired genes.
2g0Ignored.
3g1Ignored since the same gene has a hit to a paired gene.
4g0Ignored.
5g1This ends the set since it does not hit a paired gene.
6g2This could start a new set, but is not followed by another paired gene.

 

As shown earlier, sometimes two genes look as one, so the Gene Delimiter option is important to view these.
Collinear Highlight and Gene Lines

G1 hits

Highlight conserved genes

Go to top
As stated in Sequence Filters, when there are more than 2 tracks displayed in 2D, there will be two extra options for the first reference track, as shown on the right. Referring to Hit g2x2, if there are 3 tracks, the genes conserved across the 3 tracks will be highlighted, as shown below. sequence filter reference

sequence filter conserved

The above image has the settings Sequence Filter Hit g2x2 and the Hit Filter Hit =2 genes. Hence, the pink hit-wires and cyan exons indicate the conserved genes, and the green hit-wires have genes at both ends but are not conserved across the 3 tracks.

Caveats:

  1. For a 3-chromosome view, say genes A-B-C are conserved. The hit for A-B may not overlap the hit for B-C. The algorithm only checks the A-B hit and B-C hit, but not the possible A-C hit.
  2. There may be different numbers of highlighted genes on the different tracks or hit-wires between tracks. This is because multiple hits can align to a gene, and one gene can align to two different genes on the opposing genome.
  3. When there are overlapping genes, a hit is assigned to just one of them, which causes some conserved genes to have incorrect pairing.
  4. There are occasional tiny hits that just barely overlap genes; the current algorithm marks them as conserved genes.
  5. Currently, the hits may align strictly in introns (especially for the long introns of mammalian genomes); the current algorithm marks these as conserved.
Even with these caveats, I rarely see a pairing that seems 'iffy' with plant genomes, though the overlap patterns and long introns of mammalian genomes cause some questionable results. I will be working on improving the hit-to-gene assignment for the next release. As always, feedback is welcome!

Hit popup

Go to top
Default view:
  • Both lists are sorted by start coordinate.
  • The # column in the two tables align to each, e.g. the rows #2 align to each other.
  • The sequence name that is alphabetically lower (e.g. arab<cabb) is the query and the other is the target. The query is numbered 1-N; the target is ordered to match the first.
  • Two subhits may overlap (Gap<=0) on one chromosome but not the other.

Merge

2D Hit Merge In the merge view, hits with gap<=0 are merged with the hit they overlap with.

The numbered hits are no longer 1-to-1, i.e. they do not necessarily align to each other like they do in the un-merged view.

Totals are provided for the Len, Gap and total.

The merge view corresponds better to the visual view on the 2D track.

The existence of the Merge is a quick indication that there are overlapping hits.

Order

2D Hit Order As stated above, each numbered hit corresponds to the same number in the opposite list, e.g. the two #1 aligns. An '#' on the far right implies that the order numbers are not sequential.

This option produces another popup where the target list is sorted by '#', but then the coordinates are no longer sorted.

The existence of the Order is a quick indication that there are disordered hits.

Go to Top

Email Comments To: symap@agcol.arizona.edu