AGCoL SyMAP Queries UA
SyMAP Home | Download | Docs | Input | System Guide | User Guide | Tour
Content:
         > Instructions        
> Query Setup        
General
Single genes
Pair hits
Gene Groups
Query Result Panel  
Result table    
Columns    
   Pair, Single, Auto-save    
Statistics
Top buttons
   View 2D, Report...
3-chromosomes
> Results

See Release for the latest v5.6.1 and v5.6.2 changes.

See Terminology in the User Guide, especially the Cluster Hit Algo1 versus Algo2 description.


> Instructions

To open the query interface, first select two or more sequence projects in the Project Manager. Then select the Queries button at the lower right to open the queries interface.
The Instructions window (on right) lists the projects which were selected for querying.

The Notes states what the Olap column represents; if one or more project pairs used Algo1, it will be gene overlap; if all project pairs used Algo2, it will be exon overlap.

Open the Query Setup window by clicking on its tab in the left panel.

query_oview

> Query Setup

Go to top
      Sections:    1. General    2. Single genes    3. Pair Hits    4. Gene Groups

Set up the desired filters and then select Run Search to execute it. When the search is complete, the query result panel will be displayed.

 

Rules:

1. Most filters can be used in conjunction with other filters; options will be disabled if they cannot be used with a selected filter.

2. The Single queries returns rows without a hit, and only list genes for one project.

3. All Pair hits queries return a row per hit pair, i.e. aligned region between to two chromosomes from two projects.

query_query
4. All Hit# are uniquely numbered for a chromosome pair, e.g. there will be a Hit# 3 on all chromosome pairs that have hits. query_table_rows
5. All Gene# are sequentially numbered per chromosome, e.g. there will be a Gene# 3 on all chromosomes that have at least 3 non-overlapping genes. Overlapping genes have the same number, but different suffixes, e.g. 4.b.

6. Major and Minor genes:

  1. A hit can align to multiple genes on either or both chromosomes; it is assigned to the best (major) gene on each end.
  2. For example, in the image on the right, the pink gene is the major gene as the hit (pink line) fits it best, whereas the burgundy gene is the minor gene.
  3. When queries involve genes, only the hits to the major genes will be shown except for the following queries, which will also list the minor genes:
          Every*, Hit#, Gene#, and Multi with Minor* checked.
Hit overlaps two genes

1. General

Go to top
Annotation Description
Enter a substring: the entire annotation string (i.e. column All_Anno) will be searched for the substring. Hits will be shown that have at least one gene, of two possible, with the annotation.
General

Location

Chr  Select a specific chromosome for the species.
From  The Hit Start coordinates for the selected chromosome will be >= this number.
To  The Hit End coordinates for the selected chromosome will be <= this number.

It is valid to enter only the From or To, or leave both blank.

The From and To are disabled for Single genes, Block#, Collinear set#, Hit#, and Gene#.

2. Single genes

Go to top
The Single queries produces rows of genes; there is no hit or block information since the rows do not represent hits. Single
Options:
  • Orphan genes (no hits)
    Genes that do not a have a hit and meet the additional filters. The orphan genes are relative to the projects shown on the Instruction page. For example, if species X, Y and Z have synteny computed between all pairs, but only X and Y are selected, the orphan genes for X would be those with no hits to Y. If X,Y&Z are selected, the orphan genes for X would be those with no hits to Y and Z.
     
  • All genes (w/o hits), i.e. genes with and without hits
    This shows all genes that meet the additional filters, regardless if they have a hit or not. There is always the same set of genes for a project, regardless of synteny.
Unselect species: When the Single or Gene# is selected, the check boxes beside the species names will be activated. In order to view the genes from just one species, deactivate the others. Additionally, a single chromosome can be selected, but location cannot be entered. Single

3. Pair hits

Go to top
Each hit connects two species (projects) and hence represents a pair of aligned regions for two of the selected species.

Filters are as follows:

Pair

In Block (Synteny Hit)

Yes Only hits that are part of a synteny block will be returned. All hits will have a value for the Block column.
No Only hits that are NOT part of a synteny block will be returned. No hits will have a value for the Block column.

Annotated (Gene Hit)

Every  Only hits that align to a gene on one or both sides of the hit will be shown. The Gene# columns will list the best gene.
Every*  This is like the Every option, but a hit may be listed multiple times if it aligns to minor genes. The minor Gene# will be suffixed with an "*" and the Hit Type is unknown.
One  Only hits that align to a gene on ONE end will be shown. The Gene# column will show the best gene for the end with the hit.
Both  Only hits that align to genes on BOTH ends will be shown. The Gene# column will show the best gene for both ends of the hit.
None  Only hits that do NOT align to a gene on either end will be shown. The Gene# column will not have a value for either end of the hit.
If a species chromosome is selected with One and a species chromosome selected, the hit must be on the species chromosome, but the "one" gene can be on either end. Same for Every or Every*.

Collinear size

>= [=, >=] N   List all hits in collinear sets that have size >= N or = N or <= N, respectively.
Ignore   Do not filter on collinear set sizes.
See Collinear, which explains the SyMAP collinear sets.
Go to top

For the following 4 filters, do not include the 'Chr' number. Instead use the chromosome pull-downs to narrow the search to a specific chromosome, as exampled in Block#.

Block#
Enter a single block number (the Block column is formatted Chr.Chr.Block#).
This will display all hits with this block number from any chromosome pair.
Example using the chromosome pull-downs: if you select Chr 1 from the first project, Chr 2 from a second project, and enter block=3, you will see hits in block 1.2.3.

Collinear Set#
Enter a set number (the Collinear column is formatted Chr.Chr.Size.Set#).
This will display all hits with this set number from any chromosome pair. See Block example.

Hit#
Enter a hit number. Both major and minor gene hits will be shown.
This will display all hits with this number from any chromosome pair. See Block example.

Gene#
Enter a Gene# number (the Gene# column is formatted Chr.Gene#.suffix, where suffix may be blank).
A gene will only show if it has a hit. All hits with the Gene# on either end will be shown.
If a gene has a suffix:
 • If only a number is entered, all genes with the numeric prefix will be displayed (including minor hits).
 • If a number.suffix is entered, the exact gene will be displayed, including minor hits.
Select a chromosome:
 • The hit has to have one end to the chromosome, but the Gene# can be on either end.
 • By unselecting the other species (see Unselect species), the Gene# will only be on the specified end.
Group:
 • When a gene hits multiple places on the same opposite chromosome, the hits are put in a group
   (see Grp# column), which can be viewed with the 2D Group pull-down option (similar to Multi-hit).

4. Gene Groups

Go to top
   Sections:     1. Multi-hit Genes    2. PgeneF (putative gene families)   

These two options are computed on the fly. They produce query results with values for the Grp# and GrpSize columns; this in turn allows the View 2D Group option to be used.

4.1. Multi-hit Genes>= N

Multi

List all hits genes that have >= N hits.
The target gene refers to the gene with >= N hits to the opposite species.
If Both Annotated Genes is selected, the multi-genes will be confined to N genes hits.
The options are as follows:

 Minor*  Include minor hits on either chromosome.
  Tandem  (Annotated species only): The >=N hits must be to a tandem array of genes.
Same Chr is automatic with this option.
  Same Chr  The >=N hits must all be on the same opposite chromosome.
  Diff Chr  The >=N hits may be on any set of opposite chromosomes.
Go to top
The table can list the same hit multiple times, because gene X and gene Y may be connected by a hit, and both genes have >=N hits, so the hit in both groups need to be shown.

An example is shown on the right, where Hit#881 is in both Atha Grp#3 and Brap Grp#15. The image below shows the two groups, where the group hits are highlighted in magenta; these were produced using the View 2D option Group (selecting any hit in the group results in the same 2D display).

multi-hit results

multi-hit 2D

4.2. PgeneF (putative gene families)

Go to top
Note: this has not been tested for a long time except superficially (i.e. making sure it has the same results as the previous release).
Using the hits that pass the other filters, SyMAP constructs putative gene families (PgeneFs) spanning the selected species. This is done by grouping hits which overlap on at least one genome. PgeneF two
Additional options are provided when >2 species are selected. Note, if you have more than 6 species selected, this stage can take an hour or more. PgeneF

Each PgeneF is given a number, which is shown in the Query Results table (column name PgeneF). The size of the PgeneF is also shown (column PgFSize).

Filters using the PgeneF values:

Include/Exclude
These filters permit searching for gene families shared by one group of species but not present in another.

If a species is checked to include, then the PgeneF will only be retained if it includes at least one hit which hits that species.

For >2 species only: If a species is checked to exclude, then the PgeneF will be discarded if any of its hits are to that species.

For the included species:

No annotation to the included species Find PgeneFs which are not yet annotated. A PgeneF will be discarded if it is annotated on any of the species which are checked in the Include line.
Complete linkage if included species For >2 species only: Require the PgeneF to be fully linked, i.e. for each pair of species A and B in the group, there must be a hit linking A to B.
At least one hit for each included species For >2 species only: Only PgeneF hits will be shown if they have hits to the included species, although the PgeneF numbers will reflect groupings created using all hits.

Query Result Panel

Go to top
   Sections:     1. Results table    2. Columns    3. Statistics    4. Top buttons    5. 3-chromosomes   

1. Results Table

A pair hits table
query results
  • Pair Hits:
    • The table contains columns for all of the selected species, but each hit only connects two species, and the other species columns are empty.
    • Each Hit# is only listed once unless minor genes are included (see Rules).
    • A gene may be listed more than once if it has multiple major hits.

  • Single genes:
    • If the query specified Single genes, then each row represents one gene and shows data only for one species.

You can sort the columns by clicking the column name in the table, and rearrange them by dragging the column name. You can add/remove columns using the Select Columns button at the bottom.

2. Columns

Go to top
   Sections:    1. Pair hits columns    2. Single gene columns    3. Auto-save columns

query columns

The buttons on the bottom will be Select Columns and Hide Stats. If Select Columns is selected, it changes to Hide Columns and the Hide Stats is replaced with the 3 buttons explained below. query results
ClearClears the selection of all columns except Row#.
DefaultsSelects the default columns, which are shown in the image above.
ArrangeArranges similar columns, putting the gene columns first.

In the column panel shown above, hover over a column name to see its brief description. Following are the full descriptions of the columns.

2.1 Pair hits columns

Go to top
General
Row  Row number. This column does not sort.
Block  The synteny block containing this hit (if any). The format is Chr.Chr.Block#, where the two "Chr" are chromosome numbers.
Block Hits  The number of hits which comprise the synteny block.
Collinear  The collinear set containing this hit (if any). The format is Chr.Chr.Size.Set# (e.g. 1.2.5.100; there are 5 adjacent gene hits in set# 100 on Chr1 to Chr2).
Grp#  Gene#, Multi-hit gene, PgeneF: These three queries produces groups of hits, where each group has a group number. These numbers are generated during the search so will not be the same for different filter settings.
GrpSize  Gene#, Multi-hit gene, PgeneF: Size of the group for the corresponding Grp#.
Hit#  The number assigned to the hit. They are sequential along the chromosome of the alphabetically lesser species, e.g. Arab<Brap.
Hit %Id  Percent identity of the alignment. The value of the "Identity" column is from the MUMmer file. If the hit has subhits, then this is an approximation.
Hit %Sim  Percent similarity of the alignment (as determined by the BLOSUM scoring matrix). The value of the "Similarity" column is from the MUMmer file. If the hit has subhits, then this is an approximation.
Hit #Subs  The number of subhits in a clustered hit.
Hit St  If "=", both hit ends are to the same strand; if "!=", they are to different strands.
Hit Cov  The summed subhits within a clustered hit taking into accounts overlaps. The summed subhits are usually different for the two sides; this will be the longest.
Hit Type  There are two alternative algorithms for clustering the hits on database creation, which assign different hit types, as follows:
Algo1: g2 (two genes), g1 (one gene), g0 (no genes).
Algo2: E is exon, I is Intron, n is intergenic. There will be 2 characters, one for each gene, where the 1st letter goes with the alphabetically lesser project name;
e.g 'EI' would indicate the hit covers A.thal exon and Cabbage intron.
Gene&Hit Info: one row for each species
Chr  Chromosome of the hit.
Gstart/Gend/Gst  Start and end of the annotated gene. The Gst is the strand (+/-).
Gene#  The gene number is C.#.{a-z}. The C is the chromosome number. The # is the sequential number along the chromosome. If a run of genes overlap, they receive the same gene number with different suffixes {a-z, a2-z2, etc}.
Hstart/Hend  Start and end of the hit region.
Hlen  Hend-Hstart+1
Olap  The value depends on which Cluster Hit algorithm was used.
Algo1: If any of the project pairs used Algo1, then this column will be the gene overlap.
Algo2: If all of the project pairs used Algo2, then this column will be the exon overlap.
Annotation: one row for each species
The keywords for the annotations of each species are listed; they can be different for each species. See GFF Attributes for modifying the keywords shown. The Anno Key Count can be modified at any time using symap (not viewSymap).

2.2. Single genes columns

Go to top
The single genes table only has the Gene Info and Annotation columns, with one additional column, as follows.
NumHits This is the number of hits to the gene in the ENTIRE databases, except for SELF synteny.
For example,
→ if Arab, Brap and Cabb species have all been compared for synteny,
→ and only Arab-Brap or Arab-Cabb are being queried, they will have some rows with NumHits>1,
→ and the Arab-Brap-Cabb query will have all NumHits=0.
This is illustrated below, where gene# 1.2 is the first table because it does NOT have a hit to Cabb, but it has NumHits=1 because it DOES have a hit to Brap.

Arab-Cabb orphans
query single results2
  Arab-Brap orphans
query single results
  Arab-Brap-Cabb orphans
query single results

2.3 Columns and order shown

Go to top
During a SyMAP session, when you display a new table, it will use the columns and order from the last table created or modified (add/remove columns).

The selected columns are saved between sessions (described below), but the order is not.

2.4 Auto-save columns

The columns selection is saved in a file called .symap_saved_props in the user's home directory so that the next time you viewSymap, the table will show the same column (but in their default order).

If you have multiple SyMAP databases, when you change between them the columns displayed are relative to the last SyMAP database queried (they may seem some what random to a different SyMAP database).

3. Statistics

Go to top
Statistics for the query results are shown at the bottom of the results table. They can be hidden by selecting Hide Stats.

result stats

Most of the statistics are self-explanatory except the following:

Annotated and Genes: The first is the number of hits that overlap one or more genes, where a gene can have multiple hits. The second is the number of Genes with at least one hit.

Groups: This statistic is only shown if the Grp# column is populated.

Regions: This statistic is only shown if PgeneF was checked, and it is the number of distinct regions covered on that species.

4. Top buttons

Go to top
   Sections    1. Show    2. Align    3. View 2D    4. Export...    5. Report...

query buttons

The Unselect All unselects any selected rows.

4.1 Show

Go to top
For the selected row, a popup will show all columns and associated information for the hit. The text in the popup can be copied.

4.2 Align

Go to top
Select one or more rows. The sequences of the selected hit(s) are written out and a multiple alignment is created using MUSCLE (Edgar 2004 NAR:32). The figure on the right shows the MUSCLE alignment of four genes: 2 from B.rapa and 1 from Cabbage and A.thal. query muscle

4.3 View 2D

Go to top
query 2D

This displays the 2D view for the selected entry (see 3-chromosomes). The region displayed can be specified by the drop-down beside the View 2D button, as follows:

OptionColumn*Selected HitHighlight**Display Filter
Region N/A The hit is padded to each side by the amount indicated in the kb text box. Default Show all hits.
Collinear Collinear The entire collinear set of hits for the selected hit will be shown. Highlight2 Show Block and CoSets, whereCoSets is collinear sets.
Block Block The entire synteny block for the selected hit will be shown. Default Show Block Hits.
Group Grp# The entire group of hits for the selected hit will be shown. Popup-query Show all hits
*The selected row must have a value for the column.
**See Color Icon.

High checkbox If selected, the selected hit is highlighted in the Popup-query color (default magenta). The coloring can also be turned off by selecting the 2D Hit Filter Hit popop (or Query) option.
Gene# checkbox If selected, the Gene# will be shown beside each gene in the 2D display, else the Annotation box will be shown (see 2D image on lower right).
Example: Go to top

The image on the right shows a Collinear set of 5. After the initial display, the 2D view can be changed as described in the User Guide.

 

The table below shows results when the Grp# column has a value (using Gene#, Multi-gene or PgeneF search). The Group option can be used with these rows.

query 2D columns

query synteny

4.4 Export...

Go to top
One or more rows can be selected for the following exports; or if no rows are selected, the entire table is exported.

CSV: Export the rows using the selected set of columns to a CSV format suitable for import into Excel.

HTML: Export the rows using the selected set of columns to a HTML format suitable for viewing on the web (e.g. Example).

FASTA: (Pairs Only) For each row, the two hit sequences from the Hstart to Hend are written to file. NOTE: this is a very slow function and takes minutes if many rows are selected.

The Include Row column option is available because this column is alway present, but it may be desirable to not include it in the output.

result export

4.5 Report...

Go to top
   Contents    Interface    Collinear report    Gene report    Group report   

The report is on the genes in the query table (last updated v5.6.2). This is most relevant when used with >2 species.

Interface

The menu shown on the right will have different options depending on the query preformed, as follows:
Collinear size: the report will be by unions of collinear sets.
Multi-hit genes: the report will be by groups.
→ Otherwise, the report is by rows.

In all cases, a reference must be selected and the report is in reference to it. The SyMAP gene names are used in the report, which provides the chromosome numbers and order of genes. Other names can be shown by entering the appropriate keywords.

Gene Annotation Columns: One or more annotation keywords can be entered in a comma-delimited list (e.g. product, ID); the keywords must be found in the All_Anno keyword column. A column will be created containing the values of all entered keywords. Columns can be entered for any of the species (not just the reference).

report menu gene
Create
   Popup displays a panel of the results, which will look just like viewing the HTML file.
   HTML File writes a file that can be viewed as a web page.
      It is written in a human readable form such that anyone with HTML knowledge can edit it.
   TSV File writes a tab-separated-values file that can be viewed with Excel or any editor.

All species

Check boxActionConditions
Per row Only rows with all species will be shown. >2 species
Per collinear For each union of overlapping collinear sets,
the union must have all species.
>2 species
Collinear results
Show Collinear set The non-reference species columns will include the
#collinear size.collinear number.
Collinear results
The Per row option can result in no rows.

Collinear Report: Collinear sets are grouped to show the union of overlapping sets.

The HTML report on the right shows the first union of the A.thal collinear sets.

This was generated with Collinear all checked.
If Row all was checked, the 1st three rows would not be shown.

The '*' beside A.thal indicates it is the reference.

The '1 [3]' indicates it is the first union with 3 collinear sets.

For the non-reference gene columns, by default, the collinear set size.number is shown.

For example, row 4 lists A.thal gene 1.59, which is on Chr01 and aligns to the following:

Gene#Species ChrCollinear set
5153.aB.rapa Chr09set 6 of size 6
28B.rapa Chr10set 104 of size 8
1Cabbage Chr05set 1 of size 8

The other genes in each collinear set are obvious since they share the same #N.M, e.g. #8.1 is shown in 8 rows. If Row all is checked, all genes in a collinear set may NOT be shown.

report A.thal ref

Gene Report: The report lists all genes from the reference species, and what genes each aligns to. The image below shows the top of two different reports, one with A.thal selected as reference and the other with B.rapa as reference.

The annotations correspond to the genes listed for the species, where the annotations are delimited by ";".
A "---" indicates that there is a gene with no corresponding annotation. This was produced from a query on 'zinc finger protein'; recall that the description only needs to be found in one gene's All_anno values, i.e. it is not necessarily in the reference gene of the report.

report A.thal ref

Group Report: This will only work with the Multi-gene query. It show rows where the reference aligns to at least N genes, where N is the number input for multi-genes.

The Row all option works with it, but it does not require N genes for every reference row. This is illustrated in the image on the below.

The Grp# are shown beneath the reference gene. In the example below, Grp# 6 is the B.rapa group and Grp# 74 is the Cabbage group.

report A.thal ref

5. 3-chromosomes

Go to top
Report is only way to view more than two species in a query row (described immediately above).
→As shown here, it is possible to bring up the 2D display with 3 chromosomes from the query table (released v5.6.2).

Select two rows with the following requirements: (1) Both genes must exist in each row. (2) There must be a shared gene.
Select 2 rows

The two chromosomes with the unique genes may be from the same species or different species.
Select 2 rows

Select View 2D with either the Region and Collinear options. The image on the right used the Collinear options.

 

2D Highlight Conserved Genes shows how to highlight shared or unique genes in the 2D display of 3 chromosomes.

report A.thal ref

> Results

Go to top
query results tab

All query results are listed under the > Results tab on the left. Clicking this tab shows the table of queries illustrated above. Query results can be displayed by clicking a result on the left panel, or double-clicking it in the list of results table shown above.

The only way to remove query results from the left tab is by selecting them in this table followed by Remove Selected, all remove all with Remove All.

Go to top
Email Comments To: symap@agcol.arizona.edu