AGCoL SyMAP Queries UA
SyMAP Home | Download | Docs | System Guide | User Guide | Tour
Content:
>Instructions         >Query Setup         Query Result Panel   >Results
    General Result table      
    Single gene Columns      
    Pair hits     Pair, Single, Auto-save        
    PgeneFs Top button    
      Statistics    

See Terminology in the User Guide, especially the Cluster Hit Algo1 versus Algo2 description.


>Instructions

To open the query interface, first select two or more sequence projects in the Project Manager. Then select the Queries button to open the queries interface.

query_oview

The Instructions window (above) lists the projects which were selected for querying. The last line states what the Olap column represents; if one or more project pairs used Algo1, it will be gene overlap; if all project pairs used Algo2, it will be exon overlap

Open the Query Setup window by clicking on its tab in the left panel.

>Query Setup

Go to top
Sections: 1. General    2. Single genes    3. Pair Hits    4. PgeneF

Set up the desired filters and then select Run Search to execute it. When the query is complete, the query result panel will be displayed.

query_query

Rules

1. All queries are on the Pair hits unless Single is selected.

2. All Pair hits queries return a row per hit pair, i.e. aligned region between to two chromosomes from two projects.

3. Major and Minor genes: A hit can align to multiple genes on either or both chromosomes; it is assigned to the best (major) gene on each end. For example, in the image on the right, the red highlighted gene is the major gene for the highlighted hit, whereas the purple single exon gene is the minor gene.

When queries involve genes, only the hits to the major genes will be shown except for the following queries, which will also list the minor genes:
   Annotation, Every*, Hit#, Gene# when no suffix is provided, and Multi with minor checked.

4. Most filters can be used in conjunction with other filters; options will be disabled if they cannot be used with a selected filter.

Hit overlaps two genes

1. General

Go to top
Annotation Description
Enter a substring: the entire annotation string (i.e. column All_Anno) will be searched for the substring. Hits will be returned that align to the genes with the corresponding annotation. If a location is entered, the annotated gene must be in the location limits.

Location

Chr  Select a specific chromosome for the species. This is the most common query to use in conjunction with others.
From  The Start coordinates for the selected chromosome will be >= this number.
To  The End coordinates for the selected chromosome will be <= this number.

For Pair hits, the hit coordinates must pass the From and To input integers.
For Single genes, the gene coordinates must pass the From and To input integers.

It is valid to only enter the From or To, or no value for either (leave blank).

2. Single gene

Go to top
The Single queries produces rows of genes; there is no hit or block information since the rows do not represent hits. Options:
  • Orphan genes (no hits)
    Genes that do not a have a hit and meet the additional filters. The orphan genes are relative to the projects shown on the Instruction page. For example, if species X, Y and Z have synteny computed between all pairs, but only X and Y are selected, the orphan genes for X would be those with no hits to Y. If X,Y&Z are selected, the orphan genes for X would be those with no hits to Y and Z.
     
  • All genes (w/o hits), i.e. genes with and without hits
    This shows all genes that meet the additional filters, regardless if they have a hit or not. There is always the same set of genes for a project, regardless of synteny.

Allowed filters for singles: both options can be restricted by:

  • Using the Project pull-down, select a project.
  • If a project is selected, then a chromosome and optional location can be selected.
  • An Annotation Description may be entered.

3. Pair hits

Go to top
Each hit connects two species (projects) and hence represents a pair of aligned regions for two of the selected species. Filters are as follows:

In Block (Synteny Hit)

Yes Only hits that are part of a synteny block will be returned. All hits will have a value for the Block column.
No Only hits that are NOT part of a synteny block will be returned. No hits will have a value for the Block column.

Annotated (Gene Hit)

Every  Only hits that align to a gene on one or both sides of the hit will be shown. The Gene# columns will list the best gene. See Rules for genes/chr
Every*  This is like the Every option, but a hit may be listed multiple times if it aligns to overlapping genes; in other words, ALL genes will be shown if they have any alignment. The Gene# for non-best hits will be suffixed with an "*" and the Htype is unknown. See Rules for genes/chr
One  Only hits that align to a gene on ONE end will be shown. The Gene# columns will show the best gene for the end with the hit. See Rules for genes/chr
Both  Only hits that align to genes on BOTH end will be shown. The Gene# columns will show the best gene for both ends of the hit.
None Only hits that do NOT align to a gene on either end will be shown. The Gene# columns will not have a value for either end of the hit.

Collinear size >= N (or = N or <= N)
List all hits in collinear sets that have size >= N. See Collinear, which explains the SyMAP collinear sets.

Multi >= N
List all hits that align to genes on at least one end with >= N hits. Check Minor* to include minor hits.
When used with a specified chromosome, restricts genes >= N hits on the specified chromosome; see Rules for genes/chr.

Block#
Enter a single block number (the Block column is formatted Chr.Chr.Block#). Use it in conjunction with the chromosome pull-downs. For example, if you select Chr 1 from the first project, Chr 2 from a second project, and enter block=3, you will see hits in block 1.2.3 from the two project chromosomes you selected.

Collinear Set#
Enter a set number (the Collinear column is formatted Chr.Chr.Size.Set#). All hits in a collinear set have the same Set#, which is the last number of the Collinear column. This can be used in conjunction with the chromosome pull-downs (as for blocks).

Hit#
The hits are numbered for each chromosome pair, e.g. there will be a Hit#1 for every chromosome pair that has hits. If a hit aligns to multiple genes, both major and minor genes will be shown.

Gene#
Enter a Gene# number (the Gene# column is formatted Chr.Gene#.suffix where the suffix may be blank); do not include the 'Chr' number.
 • If only a number is entered, all genes with the numeric prefix will be displayed (including minor hits).
 • If a number.suffix is entered, the exact gene will be displayed (but no minor hits).
 • See Rules for genes/chr

Rules for genes/chromosome queries: The following queries involve genes and how they respond to a specified Chromosome. All queries first filter on location, but only a "Yes" further restricts the genes to the selected chromosome(s).

Selected ChrOptionRelease
NoEvery, Every*, Onev5.4.7
YesMulti, Gene#v5.4.9
For example, Every and Multi with N=1 give the same results, but when restricted to a chromosome, the Every does not require the gene to be on the specified chromosome whereas the required N=1 does.

4. Filter putative gene families (PgeneFs)

Go to top
Note: this has not been tested for a long time except superficially (i.e. making sure it has the same results as the previous release).

This is only computed if the Computed PgeneF is checked. Using the hits that pass the other filters, SyMAP constructs putative gene families (PgeneFs) spanning the selected species. This is done by grouping hits which overlap on at least one genome. Note, if you have more than 6 species selected, this stage can take an hour or more.

Each PgeneF is given a number, which is shown in the Query Results table (column name PgeneF). The size of the PgeneF is also shown (column PgFSize).

Filters using the PgeneF values:

Include/Exclude
These filters permit searching for gene families shared by one group of species but not present in another.

If a species is checked to include, then the PgeneF will only be retained if it includes at least one hit which hits that species.

For >2 species only: If a species is checked to exclude, then the PgeneF will be discarded if any of its hits are to that species.

For the included species:

No annotation to the included species Find PgeneFs which are not yet annotated. A PgeneF will be discarded if it is annotated on any of the species which are checked in the Include line.
Complete linkage if included species For >2 species only: Require the PgeneF to be fully linked, i.e. for each pair of species A and B in the group, there must be a hit linking A to B.
At least one hit for each included species For >2 species only: Only PgeneF hits will be shown if they have hits to the included species, although the PgeneF numbers will reflect groupings created using all hits.

Query Result Panel

Go to top
Sections: 1. Results table    2. Columns    3. Top buttons    4. Statistics

1. Results Table

A pair hits table
query results
  • Pair Hits:
    • The table contains columns for all of the selected species, but each hit only connects two species, and the other species columns are empty.
    • Each Hit# is only listed once unless minor genes are included (see Rules).
    • A gene may be listed more than once if multiple hits align to it with a best overlap (i.e. major gene with multiple hits).

  • Single genes:
    • If the query specified Single genes, then each row represents one gene and shows data only for one species.

You can sort the columns by clicking the column name in the table, and rearrange them by dragging the column name. You can add/remove columns using the Select Columns button at the bottom.

2. Columns

Go to top
Sections: Pair hits columns    Single gene columns    Auto-save columns

query columns

The buttons on the bottom will be Select Columns and Hide Stats. If Select Columns is selected, it changes to Hide Columns and the Hide Stats is replaced with Clear and Defaults. query results

In the column panel shown above, hover over a column name to see its brief description. Following are the full descriptions of the columns.

2.1 Pair hits columns

Go to top
General
Row  Row number within the table
Block  Synteny block containing this hit (if any). The format is Cn.Cm.Block#, where Cn and Cm are the chromosome numbers.
Block Hits  The number of hits which comprise the synteny block.
Collinear  Collinear set containing this hit (if any). The format is Cn.Cm.Size.Set# (e.g. 1.2.5.100; there are 5 adjacent gene hits in set# 100 on Chr1 to Chr2).
PgeneF  If Compute PgeneF: PgeneF number. All hits in a given PgeneF have the same number. Note that the number is generated during the search so will not be the same in a different search. Sort on this column to view the other rows with same PgeneF.
PgFSize  If Compute PgeneF: Size of the PgeneF which contains this hit.
Hit#  The hit number, which are sequential numbers for the chromosome pair. This number is shown on the Chromosome Explorer when the mouse is over the hit line.
Hit %Id  Percent identity of the alignment. The value of the "Identity" column is from the MUMmer file. If the hit has subhits, then this is an approximation.
Hit %Sim  Percent similarity of the alignment (as determined by the BLOSUM scoring matrix). The value of the "Similarity" column is from the MUMmer file. If the hit has subhits, then this is an approximation.
Hit #Subs  The number of subhits in a clustered hit.
Hit St  If "=", both hit ends are to the same strand; if "!=", they are to different strands.
Hit Cov  The length of the subhits within a clustered hit are summed taking into accounts overlaps; the longest summed subhits of both sides is shown. See Clustered Hits in Terminology.
Hit Type  There are two alternative algorithms for clustering the hits on database creation.
Algo1: g2 (two genes), g1 (one gene), g0 (no genes).
Algo2: E is exon, I is Intron, n is intergenic. There will be 2 characters, one for each project, where the 1st letter goes with the alphabetically lesser project name; e.g 'EI' indicates the hit covers A.thal exon(s) and Cabbage intron(s).
Gene&Hit Info: one row for each species
Chr  Chromosome of the hit.
Gstart/Gend/Gst  Start and end of the annotated gene. The Gst is the strand (+/-).
Gene#  The gene number is C.#.{a-z}. The C is the chromosome number. The # is the sequential number along the chromosome. If a run of genes overlap, they receive the same gene number with different suffixes {a-z, a2-z2, etc}. This is shown on the Chromosome Explorer "Annotation Description" and when the mouse hoovers over the gene.
Hstart/Hend  Start and end of the hit region.
Hlen  Hend-Hstart+1
Olap  The value depends on which Cluster Hit algorithm was used.
Algo1: If any of the project pairs used Algo1, then this column will be the gene overlap for all project pairs.
Algo2: If all of the project pairs used Algo2, then this column will be the exon overlap.
NOTE: For gene overlap, Algo2 takes into account gaps between the subhits, whereas Algo1 does not.
Annotation: one row for each species
The keywords for the annotations of each species are listed; they can be different for each species. See Project Parameters for modifying the keywords shown. The Anno Key Count can be modified at any time using symap (not viewSymap).

2.2. Single genes columns

Go to top
The single genes tables only have the Gene Info and Annotation columns, with one additional column, as follows.
NumHits This is the number of hits to the gene in the ENTIRE databases, except for SELF synteny.
For example, if X, Y and Z species have all been compared for synteny, but only Y and Z are being queried, the Orphans for Y can have NumHits>0 if it hits X.
Single result from 3 selected species
query single results
Single result from 2 selected species
query single results2

In the above tables, the A.thal gene #2 is the 2-species table because it has not hit to Cabbage; but it is not in the 3-species table because it does have a hit to A.thal. The values of NumHits is the same in both. That is, they both are showing orphan genes relative to the selected set, but the NumHits lets the user know if its orphan over all the computed sets.

The other query on singles is for All genes, which provides the same columns.

2.3 Auto-save columns

The columns selection is saved in a file called .symap_saved_props in the user's home directory. If you only have one SyMAP database, there is no need to read the following:

Say the last database Dn queried had N species, and the current database Dm has M species, where Dn and Dm are different SyMAP databases:

  1. If N=M: the General and Gene&Hit Info columns will be set according to the previous settings, but the Annotation columns may be wrong.
  2. If N>M or N<M: the General will be according to the previous settings, all Gene&Hit Info columns will be set like the first Dn species, and no Annotation columns will be set.
This feature was implemented in v5.3.2.

3. Top button functions

Go to top
query buttons

Show: For the selected row, a popup will show all columns and associated information for the hit. The text in the popup can be copied.

Align: Select one or more rows. The sequences of the selected hit(s) are written out and a multiple alignment is created using MUSCLE (Edgar 2004 NAR:32). The figure on the right shows the MUSCLE alignment of two genes from B.rapa and genes from Cabbage and A.thal. query muscle
View 2D: This displays the 2D view for the selected entry. The following 3 views are provided using the drop-down (the image below shows a Collinear set of 5).
Region (kb) The selected hit is padded to each side by the amount indicated (default 50kb). The default filter is Show all hits.
Synteny block The entire block for this hit will be shown. The default filter is Show Block Hits.
Collinear set If the hit is in a collinear set, the set will be shown. The default filter is Highlight Collinear Set; Show Block and Set Hits.
query synteny
After the initial display, the 2D view can be changed as described in the User Guide.

High: If this is selected, the selected hit is highlighted in red; the selected color is the same as the hover color, and can be changed in the Color Icon.

Export: One or more rows can be selected for the following exports; or if no rows are selected, the entire table is exported.

CSV: Export the rows using the selected set of columns to a CSV format suitable for import into Excel.

HTML: Export the rows using the selected set of columns to a HTML format suitable for viewing on the web (e.g. Example).

FASTA: (Pairs Only) Sequences from the rows are written to a FASTA file. Both sides of each hit are written using the start/end coordinates shown in the table. NOTE: this is a very slow function and takes minutes if many rows are selected.

result export
The Include Row column option is available because this column is alway present, but it may be desirable to not include it in the output.

4. Statistics

Go to top
Statistics for the query results are shown at the bottom of the results table. They can be hidden by selecting Hide Stats.

result stats

Most of the statistics are self-explanatory except the following:

Annotated and Genes: The first is the number of hits that overlap one or more genes, where a gene can have multiple hits. The second is the number of Genes with at least one hit.

Regions: This statistic is only shown if Compute PgeneF was checked, and it is the number of distinct regions covered on that species.

>Results

Go to top
query results tab

All query results are listed under the >Results tab on the left. Clicking this tab shows the table of queries illustrated above. Query results can be displayed by clicking a result on the left panel, or double-clicking it in the list of results table shown above.

The only way to remove query results from the left tab is by selecting them in this table followed by Remove Selected Query.

Email Comments To: symap@agcol.arizona.edu