Content:
See Release for the latest v5.6.1 and v5.6.2 changes.
See Terminology in the User Guide,
especially the Cluster Hit Algo1 versus Algo2 description.
> Instructions
To open the query interface, first select two or more sequence projects in the
Project Manager.
Then select the Queries button at the lower right to open the queries interface.
The Instructions window (on right) lists the projects which were selected for querying.
The Notes states what the Olap column represents; if one or more project pairs used Algo1, it will be gene overlap;
if all project pairs used Algo2, it will be exon overlap.
Open the Query Setup window by clicking on its tab in the left panel.
|
|
Sections:
1. General
2. Single genes
3. Pair Hits
4. Gene Groups
Set up the desired filters and then select Run Search to execute it.
When the search is complete, the query result panel will be displayed.
Rules:
1. Most filters can be used in conjunction with other
filters; options will be disabled if they cannot be used with a selected filter.
2. The Single queries returns rows without a hit, and only list genes for one project.
3. All Pair hits queries return a row per hit pair, i.e.
aligned region between to two chromosomes from two projects.
|
|
4. All Hit# are uniquely numbered for a chromosome pair,
e.g. there will be a Hit# 3 on all chromosome pairs that have hits.
|
|
5. All Gene# are sequentially numbered per chromosome, e.g. there will be a
Gene# 3 on all chromosomes that have at least 3 non-overlapping genes. Overlapping
genes have the same number, but different suffixes, e.g. 4.b.
6. Major and Minor genes:
- A hit can align to multiple genes on either or both chromosomes; it is assigned to the best (major) gene on each end.
- For example, in the image on the right, the pink gene is the major gene as the hit (pink line)
fits it best, whereas the burgundy gene is the minor gene.
- When queries involve genes, only the hits to the major genes will be shown
except for the following queries, which will also list the minor genes:
Every*, Hit#,
Gene#, and Multi with Minor* checked.
|
|
Annotation Description
Enter a substring: the entire annotation string (i.e. column All_Anno) will be searched
for the substring.
Hits will be shown that have at least one gene, of two possible, with the annotation.
|
|
Location
Chr |
| Select a specific chromosome for the species.
| From |
| The Hit Start coordinates for the selected chromosome will be >= this number.
| To |
| The Hit End coordinates for the selected chromosome will be <= this number.
|
It is valid to enter only the From or To, or leave both blank.
The From and To are disabled for
Single genes, Block#, Collinear set#, Hit#, and Gene#.
The Single queries produces rows of genes; there is no hit or block
information since the rows do not represent hits.
|
|
Options:
- Orphan genes (no hits)
Genes that do not a have a hit and meet the additional filters.
The orphan genes are relative to the projects shown on the Instruction page.
For example, if species X, Y and Z have synteny computed between all pairs,
but only X and Y are selected, the orphan genes for X would be those with no hits to Y.
If X,Y&Z are selected, the orphan genes for X would be those with no hits to Y and Z.
- All genes (w/o hits), i.e. genes with and without hits
This shows all genes that meet the additional filters, regardless if they have a hit or not.
There is always the same set of genes for a project, regardless of synteny.
Unselect species: When the Single or Gene# is selected, the check boxes beside the species names will be
activated. In order to view the genes from just one species, deactivate the others. Additionally,
a single chromosome can be selected, but location cannot be entered.
|
|
Each hit connects two species (projects) and hence represents a pair of aligned
regions for two of the selected species.
Filters are as follows:
|
|
In Block (Synteny Hit)
Yes | | | Only hits that are part of a synteny
block will be returned. All hits will have a value for the Block column.
| No | | | Only hits that are NOT part of a synteny
block will be returned. No hits will have a value for the Block column.
|
Annotated (Gene Hit)
Every |
| Only hits that align to a gene on one or both sides of the hit will be shown.
The Gene# columns will list the best gene.
|
Every* |
| This is like the Every option, but a hit may be listed multiple times if it aligns
to minor genes. The minor Gene# will be suffixed with an "*" and the Hit Type is unknown.
|
One |
| Only hits that align to a gene on ONE end will be shown.
The Gene# column will show the best gene for the end with the hit.
|
Both |
| Only hits that align to genes on BOTH ends will be shown.
The Gene# column will show the best gene for both ends of the hit.
|
None |
| Only hits that do NOT align to a gene on either end will be shown.
The Gene# column will not have a value for either end of the hit.
| If a species chromosome is selected with One and a species chromosome selected,
the hit must be on the species chromosome, but the "one" gene can be on either end. Same for
Every or Every*.
|
Collinear size
>= [=, >=] N
|
| List all hits in collinear sets that have size >= N or = N or <= N, respectively.
|
Ignore
|
| Do not filter on collinear set sizes.
| See Collinear, which explains the SyMAP collinear sets.
|
For the following 4 filters, do not include the 'Chr' number. Instead use the chromosome pull-downs
to narrow the search to a specific chromosome, as exampled in Block#.
Block#
Enter a single block number (the Block column is formatted Chr.Chr.Block#).
This will display all hits with this block number from any chromosome pair.
Example using the chromosome pull-downs: if you select Chr 1 from the first project, Chr 2 from a second project, and enter block=3, you will see
hits in block 1.2.3.
Collinear Set#
Enter a set number (the Collinear column is formatted Chr.Chr.Size.Set#).
This will display all hits with this set number from any chromosome pair. See Block example.
Hit#
Enter a hit number. Both major and minor gene hits will be shown.
This will display all hits with this number from any chromosome pair. See Block example.
Gene#
Enter a Gene# number (the Gene# column is formatted Chr.Gene#.suffix, where suffix may be blank).
A gene will only show if it has a hit. All hits with the Gene# on either end will be shown.
If a gene has a suffix:
• If only a number is entered, all genes with the numeric prefix will be displayed (including minor hits).
• If a number.suffix is entered, the exact gene will be displayed, including minor hits.
Select a chromosome:
• The hit has to have one end to the chromosome, but the Gene# can be on either end.
• By unselecting the other species (see Unselect species), the Gene# will only be on the specified end.
Group:
• When a gene hits multiple places on the same opposite chromosome, the hits are put in a group
(see Grp# column), which can be viewed with the
2D Group pull-down option (similar to Multi-hit).
Sections:
1. Multi-hit Genes
2. PgeneF (putative gene families)
These two options are computed on the fly.
They produce query results with values for the Grp# and GrpSize columns; this in turn
allows the View 2D Group option to be used.
4.1. Multi-hit Genes>= N
List all hits genes that have >= N hits.
The target gene refers to the gene with >= N hits to the opposite species.
If Both Annotated Genes is selected, the multi-genes will be confined
to N genes hits.
The options are as follows:
| Minor*
| | Include minor hits on either chromosome.
| |
Tandem
| | (Annotated species only): The >=N hits must be to a tandem array of genes.
Same Chr is automatic with this option.
| |
Same Chr
| | The >=N hits must all be on the same opposite chromosome.
| |
Diff Chr
| | The >=N hits may be on any set of opposite chromosomes.
|
The table can list the same hit multiple times, because gene X and gene Y may be connected by a hit, and both genes
have >=N hits, so the hit in both groups need to be shown.
An example is shown on the right, where Hit#881 is in both Atha Grp#3 and Brap Grp#15.
The image below shows the two groups, where the group hits are highlighted in magenta; these were
produced using the View 2D option Group (selecting any hit in the group results in the
same 2D display).
|
|
4.2. PgeneF (putative gene families)
| Go to top |
→Note: this has not been tested for a long time except superficially (i.e. making sure it has the same results as the previous
release).
Using the hits that pass the other filters, SyMAP constructs putative gene families (PgeneFs)
spanning the selected species. This is done by grouping hits which overlap on at
least one genome.
|
|
Additional options are provided when >2 species are selected.
Note, if you have more than 6 species selected, this stage can take
an hour or more.
|
|
Each PgeneF is given a number, which is shown in the Query Results table (column name PgeneF).
The size of the PgeneF is also shown (column PgFSize).
Filters using the PgeneF values:
Include/Exclude
These filters permit searching for gene families shared by one group of species but not
present in another.
If a species is checked to include, then the PgeneF will only be retained if it includes
at least one hit which hits that species.
For >2 species only: If a species is checked to exclude,
then the PgeneF will be discarded if any of its hits are to that species.
For the included species:
No annotation to the included species
| Find PgeneFs which are not yet annotated. A PgeneF will be discarded if it is annotated on any of
the species which are checked in the Include line.
| Complete linkage if included species
| For >2 species only: Require the PgeneF to be fully linked, i.e. for each pair of species A and B in the group,
there must be a hit linking A to B.
| At least one hit for each included species
| For >2 species only: Only PgeneF hits will be shown if they have hits to the included species,
although the PgeneF numbers will reflect groupings created using all hits.
|
Sections:
1. Results table
2. Columns
3. Statistics
4. Top buttons
5. 3-chromosomes
1. Results Table
A pair hits table
- Pair Hits:
- The table contains columns for all of the selected species, but each hit only connects two species, and the other species columns are empty.
- Each Hit# is only listed once unless minor genes are included (see Rules).
- A gene may be listed more than once if it has multiple major hits.
- Single genes:
- If the query specified Single genes, then each row represents one gene and shows data only
for one species.
You can sort the columns by clicking the column name in the table, and rearrange them by dragging the
column name. You can add/remove columns using the Select Columns button at the bottom.
Sections:
1. Pair hits columns
2. Single gene columns
3. Auto-save columns
The buttons on the bottom will be Select Columns and Hide Stats. If
Select Columns is selected, it changes to Hide Columns and the
Hide Stats is replaced with the 3 buttons explained below.
|
|
Clear | Clears the selection of all columns except Row#.
| Defaults | Selects the default columns, which are shown in the image above.
| Arrange | Arranges similar columns, putting the gene columns first.
|
In the column panel shown above, hover over a column name to see its brief description. Following
are the full descriptions of the columns.
General
| Row |
| Row number. This column does not sort.
| Block |
| The synteny block containing this hit (if any). The format is Chr.Chr.Block#, where
the two "Chr" are chromosome numbers.
| Block Hits |
| The number of hits which comprise the synteny block.
| Collinear |
| The collinear set containing this hit (if any). The format is Chr.Chr.Size.Set# (e.g. 1.2.5.100; there are 5
adjacent gene hits in set# 100 on Chr1 to Chr2).
| Grp# |
| Gene#, Multi-hit gene, PgeneF: These three queries produces groups of hits,
where each group has a group number. These numbers are generated during the search so will not be the same for different filter settings.
| GrpSize |
| Gene#, Multi-hit gene, PgeneF: Size of the group for the corresponding Grp#.
| Hit# |
| The number assigned to the hit. They are sequential along the chromosome of the alphabetically lesser species, e.g. Arab<Brap.
| Hit %Id |
| Percent identity of the alignment. The value of the "Identity" column is from the MUMmer file.
If the hit has subhits, then this is an approximation.
| Hit %Sim |
| Percent similarity of the alignment (as determined by the BLOSUM scoring matrix).
The value of the "Similarity" column is from the MUMmer file. If the hit has subhits, then
this is an approximation.
| Hit #Subs |
| The number of subhits in a clustered hit.
| Hit St |
| If "=", both hit ends are to the same strand; if "!=", they are to different strands.
| Hit Cov |
| The summed subhits within a clustered hit taking into accounts overlaps. The summed subhits
are usually different for the two sides; this will be the longest.
| Hit Type |
| There are two alternative algorithms for clustering the hits on database creation, which assign
different hit types, as follows:
Algo1: g2 (two genes), g1 (one gene), g0 (no genes).
Algo2: E is exon, I is Intron, n is intergenic. There will be 2 characters, one for each gene,
where the 1st letter goes with the alphabetically lesser project name;
e.g 'EI' would indicate the hit covers A.thal exon and Cabbage intron.
|
Gene&Hit Info: one row for each species
| Chr |
| Chromosome of the hit.
| Gstart/Gend/Gst |
| Start and end of the annotated gene. The Gst is the strand (+/-).
| Gene# |
| The gene number is C.#.{a-z}. The C is the chromosome number.
The # is the sequential number along the chromosome.
If a run of genes overlap, they receive the same gene number with different suffixes {a-z, a2-z2, etc}.
| Hstart/Hend |
| Start and end of the hit region.
| Hlen |
| Hend-Hstart+1
| Olap |
| The value depends on which Cluster Hit algorithm was used.
Algo1: If any of the project pairs used Algo1, then this column will be
the gene overlap.
Algo2: If all of the project pairs used Algo2, then this column will be
the exon overlap.
| Annotation: one row for each species
| The keywords for the annotations of each species are listed;
they can be different for each species.
See GFF Attributes
for modifying the keywords shown. The Anno Key Count can be modified at any time using symap (not viewSymap).
|
The single genes table only has the
Gene Info and Annotation columns, with one additional column, as follows.
NumHits
| This is the number of hits to the gene in the ENTIRE databases, except for SELF synteny.
|
For example,
→ if Arab, Brap and Cabb species have all been compared for synteny,
→ and only Arab-Brap or Arab-Cabb are being queried, they will have some rows with NumHits>1,
→ and the Arab-Brap-Cabb query will have all NumHits=0.
This is illustrated below, where gene# 1.2 is the first table
because it does NOT have a hit to Cabb, but it has NumHits=1 because it DOES have a hit to Brap.
Arab-Cabb orphans
|
| Arab-Brap orphans
|
| Arab-Brap-Cabb orphans
|
During a SyMAP session, when you display a new table,
it will use the columns and order from the last table created or modified (add/remove columns).
The selected columns are saved between sessions (described below), but the order is not.
2.4 Auto-save columns
The columns selection is saved in a file called .symap_saved_props in the user's home directory so
that the next time you viewSymap, the table will show the same column (but in their default order).
If you have multiple SyMAP databases, when you change between them the columns displayed are relative to
the last SyMAP database queried (they may seem some what random to a different SyMAP database).
Statistics for the query results are shown at the bottom of the results table. They can be
hidden by selecting Hide Stats.
Most of the statistics are self-explanatory except the following:
Annotated and Genes: The first is the number of hits that
overlap one or more genes, where a gene can have multiple hits.
The second is the number of Genes with at least one hit.
Groups: This statistic is only shown if the Grp# column is populated.
Regions: This statistic is only shown if PgeneF was checked, and it is
the number of distinct regions covered on that species.
Sections
1. Show
2. Align
3. View 2D
4. Export...
5. Report...
The Unselect All unselects any selected rows.
For the selected row, a popup will show all columns and associated information for the hit.
The text in the popup can be copied.
Select one or more rows. The sequences of the selected hit(s) are written out and a multiple alignment is created
using MUSCLE (Edgar 2004 NAR:32). The figure on the right shows the MUSCLE alignment of
four genes: 2 from B.rapa and 1 from Cabbage and A.thal.
|
|
This displays the 2D view for the selected entry (see 3-chromosomes).
The region displayed can be specified by the drop-down beside the View 2D button, as follows:
Option | Column* | Selected Hit | Highlight** | Display Filter
| Region
| N/A
| The hit is padded to each side by the amount indicated in the kb text box.
| Default
| Show all hits.
| Collinear
| Collinear
| The entire collinear set of hits for the selected hit will be shown.
| Highlight2
| Show Block and CoSets, whereCoSets is collinear sets.
| Block
| Block
| The entire synteny block for the selected hit will be shown.
| Default
| Show Block Hits.
| Group
| Grp#
| The entire group of hits for the selected hit will be shown.
| Popup-query
| Show all hits
|
*The selected row must have a value for the column.
**See Color Icon.
High checkbox
| If selected, the selected hit is highlighted in the Popup-query color (default magenta).
The coloring can also be turned off by selecting the 2D Hit Filter Hit popop (or Query) option.
| Gene# checkbox
| If selected, the Gene# will be shown beside each gene in the 2D display, else the Annotation box
will be shown (see 2D image on lower right).
|
|
The image on the right shows a Collinear set of 5.
After the initial display, the 2D view can be changed as described in the
User Guide.
The table below shows results when the Grp# column has a value
(using Gene#, Multi-gene or PgeneF search). The Group option
can be used with these rows.
|
|
One or more rows can be selected for the following exports; or if no rows are selected, the entire table is exported.
CSV:
Export the rows using the selected set of columns to a CSV format suitable for import into Excel.
HTML:
Export the rows using the selected set of columns to a HTML format suitable for viewing on the web
(e.g. Example).
FASTA: (Pairs Only) For each row, the two hit sequences from the Hstart to Hend
are written to file. NOTE: this is a very slow function and takes minutes if many rows are selected.
The Include Row column option is available because this column is alway present, but
it may be desirable to not include it in the output.
|
|
Contents
Interface
Collinear report
Gene report
Group report
The report is on the genes in the query table (last updated v5.6.2).
This is most relevant when used with >2 species.
Interface
The menu shown on the right will have different options depending on the query preformed, as follows:
→ Collinear size: the report will be by unions of collinear sets.
→ Multi-hit genes: the report will be by groups.
→ Otherwise, the report is by rows.
In all cases, a reference must be selected and the report is in reference to it.
The SyMAP gene names are used in the report, which provides the chromosome numbers and order of genes.
Other names can be shown by entering the appropriate keywords.
Gene Annotation Columns: One or more annotation keywords can be entered in a comma-delimited list (e.g. product, ID); the
keywords must be found in the All_Anno keyword column. A column will be created containing the values of all entered keywords.
Columns can be entered for any of the species (not just the reference).
|
|
Create
Popup displays a panel of the results, which will look just like viewing the HTML file.
HTML File writes a file that can be viewed as a web page.
It is written in a human readable form such that anyone with HTML knowledge can edit it.
TSV File writes a tab-separated-values file that can be viewed with Excel or
any editor.
All species
Check box | Action | Conditions
| Per row
| Only rows with all species will be shown.
| >2 species
| Per collinear
| For each union of overlapping collinear sets,
the union must have all species.
| >2 species Collinear results
| Show Collinear set
| The non-reference species columns will include the
#collinear size.collinear number.
| Collinear results
|
The Per row option can result in no rows.
Collinear Report: Collinear sets are grouped to show the union of overlapping sets.
The HTML report on the right shows the first union of the A.thal collinear sets.
This was generated with Collinear all checked.
If Row all was checked, the 1st three rows would not be shown.
The '*' beside A.thal indicates it is the reference.
The '1 [3]' indicates it is the first union with 3 collinear sets.
For the non-reference gene columns, by default, the collinear set size.number is shown.
For example, row 4 lists A.thal gene 1.59, which is on Chr01 and aligns to the following:
Gene# | Species Chr | Collinear set
| 5153.a | B.rapa Chr09 | set 6 of size 6
| 28 | B.rapa Chr10 | set 104 of size 8
| 1 | Cabbage Chr05 | set 1 of size 8
|
The other genes in each collinear set are obvious since they share the same #N.M, e.g. #8.1 is shown in 8 rows.
If Row all is checked, all genes in a collinear set may NOT be shown.
|
|
Gene Report: The report lists all genes from the reference species,
and what genes each aligns to. The image below shows the top of two different reports, one with
A.thal selected as reference and the other with B.rapa as reference.
The annotations correspond to the genes listed for the species, where the annotations are delimited by ";".
A "---" indicates that there is a gene with no corresponding annotation.
This was produced from
a query on 'zinc finger protein'; recall that the description only needs to be
found in one gene's All_anno values, i.e. it is not necessarily in the reference gene
of the report.
Group Report: This will only work with the Multi-gene query.
It show rows where the reference aligns to at least N genes, where N is the number input for multi-genes.
The Row all option works with it, but it does not require N genes for every reference
row. This is illustrated in the image on the below.
The Grp# are shown beneath the reference gene.
In the example below, Grp# 6 is the B.rapa group and Grp# 74 is the
Cabbage group.
→ Report is only way to view more than two species in a query row (described
immediately above).
→As shown here, it is possible to bring up the 2D display with 3 chromosomes from the query table (released v5.6.2).
Select two rows with the following requirements:
(1) Both genes must exist in each row.
(2) There must be a shared gene.
The two chromosomes with the unique genes may be from the same species or different species.
Select View 2D with either the Region and Collinear options. The
image on the right used the Collinear options.
2D Highlight Conserved Genes
shows how to highlight shared or unique genes in the 2D display of 3 chromosomes.
|
|
All query results are listed under the > Results tab on the left.
Clicking this tab shows the table of queries illustrated above.
Query results can be displayed by clicking a result on the left panel,
or double-clicking it in the list of results table shown above.
The only way to remove query results from the left tab is by selecting them in this table followed
by Remove Selected, all remove all with Remove All.
|