Content:
See Terminology in the User Guide,
especially the Cluster Hit Algo1 versus Algo2 description.
>Instructions
To open the query interface, first select two or more sequence projects in the Project Manager.
Then select the Queries button to open the queries interface.
The Instructions window (above) lists the projects which were selected for querying. The last
line states what the Olap column represents; if one or more project pairs used Algo1, it will be gene overlap;
if all project pairs used Algo2, it will be exon overlap
Open the Query Setup window by clicking on its tab in the left panel.
Sections:
1. General
2. Single genes
3. Pair Hits
4. Gene Groups
Set up the desired filters and then select Run Search to execute it.
When the query is complete, the query result panel will be displayed.
Rules
1. All queries are on the Pair hits unless Single is selected.
2. All Pair hits queries return a row per hit pair, i.e.
aligned region between to two chromosomes from two projects.
|
|
3. All Hit# are uniquely numbered for a chromosome pair, e.g. there will be a hit #3 on all chromosome pairs
that have at least 3 hits.
All Gene# are sequentially numbered per chromosome, e.g. there will be a
gene #3 on all chromosomes that have at least 3 non-overlapping genes.
4. Major and Minor genes:
a. A hit can align to multiple genes on either or both chromosomes; it is assigned to the best (major) gene on each end.
b. For example, in the image on the right, the burgundy gene is the major gene as the hit (green line)
fits to it best,
whereas the blue single exon gene is the minor gene.
c. When queries involve genes, only the hits to the major genes will be shown
except for the following queries, which will also list the minor genes:
Every*, Hit#,
Gene# when no suffix is provided, and Multi with minor checked.
5. Most filters can be used in conjunction with other
filters; options will be disabled if they cannot be used with a selected filter.
|
|
Annotation Description
Enter a substring: the entire annotation string (i.e. column All_Anno) will be searched
for the substring. Hits will be returned that align to the genes with the corresponding annotation.
|
|
Location
Chr |
| Select a specific chromosome for the species. This is the most common query to use in
conjunction with others.
| From |
| The Start coordinates for the selected chromosome will be >= this number.
| To |
| The End coordinates for the selected chromosome will be <= this number.
|
For Pair hits, the hit coordinates must pass the From and To input integers.
For Single genes, the gene coordinates must pass the From and To input integers.
It is valid to only enter the From or To, or no value for either (leave blank).
The Single queries produces rows of genes; there is no hit or block
information since the rows do not represent hits.
|
|
Options:
- Orphan genes (no hits)
Genes that do not a have a hit and meet the additional filters.
The orphan genes are relative to the projects shown on the Instruction page.
For example, if species X, Y and Z have synteny computed between all pairs,
but only X and Y are selected, the orphan genes for X would be those with no hits to Y.
If X,Y&Z are selected, the orphan genes for X would be those with no hits to Y and Z.
- All genes (w/o hits), i.e. genes with and without hits
This shows all genes that meet the additional filters, regardless if they have a hit or not.
There is always the same set of genes for a project, regardless of synteny.
Allowed filters for singles: both options can be restricted by:
- Using the Project pull-down, select a project.
- If a project is selected, then a chromosome and optional location can be selected.
- An Annotation Description may be entered.
Each hit connects two species (projects) and hence represents a pair of aligned
regions for two of the selected species. Filters are as follows:
|
|
In Block (Synteny Hit)
Yes | | | Only hits that are part of a synteny
block will be returned. All hits will have a value for the Block column.
| No | | | Only hits that are NOT part of a synteny
block will be returned. No hits will have a value for the Block column.
|
Annotated (Gene Hit)
Every | |
| Only hits that align to a gene on one or both sides of the hit will be shown.
The Gene# columns will list the best gene.
| Every* | |
| This is like the Every option, but a hit may be listed multiple times if it aligns
to overlapping genes; in other words, ALL genes will be shown if they have any alignment.
The Gene# for non-best hits will be suffixed with an "*" and the Htype is
unknown.
| One | |
| Only hits that align to a gene on ONE end will be shown.
The Gene# column will show the best gene for the end with the hit.
| Both | |
| Only hits that align to genes on BOTH end will be shown.
The Gene# column will show the best gene for both ends of the hit.
| None | | | Only hits that do NOT align to a gene
on either end will be shown. The Gene# column will not have a value for either end of the hit.
|
Location: if a chromosome is selected, every hit row must be on the selected chromosome, but the "every" or "one"
gene may be on the opposite chromosome.
Collinear size
| >= [=, >=] N |
| List all hits in collinear sets that have size >= N or = N or <= N, respectively.
The text box must have an integer >0;
| | Ignore | | Do not filter on collinear set sizes.
|
See Collinear, which explains the SyMAP collinear sets.
Block#
Enter a single block number (the Block column is formatted
Chr.Chr.Block#). Use it in conjunction with the chromosome pull-downs. For example,
if you select Chr 1 from the first project, Chr 2 from a second project, and enter block=3, you will see
hits in block 1.2.3 from the two project chromosomes you selected.
Collinear Set#
Enter a set number (the Collinear column is formatted Chr.Chr.Size.Set#). All hits in a collinear set have the same Set#, which is the last number of the Collinear column.
This can be used in conjunction with the chromosome pull-downs (as for blocks).
Hit#
The hits are numbered for each chromosome pair, e.g. there will be a Hit#1 for every chromosome pair that
has hits. If a hit aligns to multiple genes, both major and minor genes will be shown.
Gene#
Enter a Gene# number (the Gene# column is formatted Chr.Gene#.suffix
where the suffix may be blank); do not include the 'Chr' number.
• If only a number is entered, all genes with the numeric prefix will be displayed (including minor hits).
• If a number.suffix is entered, the exact gene will be displayed (but no minor hits).
This query results in values for the Grp# and GrpSize columns; this in turn
allows the View 2D Group option to be used.
These two options are computed on the fly.
They produce query results with values for the Grp# and GrpSize columns; this in turn
allows the View 2D Group option to be used.
4.1 Multi-hit Genes>= N
List all hits genes that have >= N hits, where multiple hits can join the same two genes.
The target gene refers to the gene with >= N hits to the opposite species.
The options are as follows:
| Exon
| | (Algo2 only) The hits must be to exons in the target gene.
| | Minor*
| | Include minor hits on either chromosome.
| |
Opposite
| | Same Chr
| | The >=N hits must all be on the same opposite chromosome.
| | Tandem
| | (Annotated species only): The >=N hits must be to a tandem array of genes.
Same Chr is automatic with this option.
|
Good associated filters:
| Location Chr:
| When used with a specified chromosome,
restricts the target genes with >= N hits to the specified chromosome.
For example:
- If species1 Chr02 is selected, there will only be gene
groups shown from Chr02 to any chromosome in species2.
- If both species1 Chr02
and species2 Chr03 are selected, all gene groups from
species1 Chr02 to species2 Chr03 will be shown, and
vice versa.
| | Annotated Both
| | Each target gene must have its N hits to genes on the opposite species (not needed with the
Tandem option).
| |
Table can list the same hit multiple times:
For all other queries, a hit will only be listed once for a chromosome pair; however, that is not
the case for this query. That is because gene X and gene Y may be connected by a hit, where both gene X
and gene Y have >=N, so both groups need to be shown.
An example is shown on the right, where the
two highlighted rows are the same, but in two different groups (grp#1 for gene 5.1458 and grp#4 for gene 3.638).
|
|
View 2D Group Example:
The image below shows the two groups, where the group hits are highlighted in magenta; these were
produced using the View 2D option Group.
4.2 PgeneF (putative gene families)
Note: this has not been tested for a long time except superficially (i.e. making sure it has the same results as the previous
release).
Using the hits that pass the other filters, SyMAP constructs putative gene families (PgeneFs)
spanning the selected species. This is done by grouping hits which overlap on at
least one genome.
|
|
Additional options are provided when >2 species are selected.
Note, if you have more than 6 species selected, this stage can take
an hour or more.
|
|
Each PgeneF is given a number, which is shown in the Query Results table (column name PgeneF).
The size of the PgeneF is also shown (column PgFSize).
Filters using the PgeneF values:
Include/Exclude
These filters permit searching for gene families shared by one group of species but not
present in another.
If a species is checked to include, then the PgeneF will only be retained if it includes
at least one hit which hits that species.
For >2 species only: If a species is checked to exclude,
then the PgeneF will be discarded if any of its hits are to that species.
For the included species:
No annotation to the included species
| Find PgeneFs which are not yet annotated. A PgeneF will be discarded if it is annotated on any of
the species which are checked in the Include line.
| Complete linkage if included species
| For >2 species only: Require the PgeneF to be fully linked, i.e. for each pair of species A and B in the group,
there must be a hit linking A to B.
| At least one hit for each included species
| For >2 species only: Only PgeneF hits will be shown if they have hits to the included species,
although the PgeneF numbers will reflect groupings created using all hits.
|
Sections:
1. Results table
2. Columns
3. Top buttons
4. Statistics
1. Results Table
A pair hits table
- Pair Hits:
- The table contains columns for all of the selected species, but each hit only connects two species, and the other species columns are empty.
- Each Hit# is only listed once unless minor genes are included (see Rules).
- A gene may be listed more than once if multiple hits align to it with a best overlap (i.e. major gene with multiple hits).
- Single genes:
- If the query specified Single genes, then each row represents one gene and shows data only
for one species.
You can sort the columns by clicking the column name in the table, and rearrange them by dragging the
column name. You can add/remove columns using the Select Columns button at the bottom.
Sections:
Pair hits columns
Single gene columns
Auto-save columns
The buttons on the bottom will be Select Columns and Hide Stats. If
Select Columns is selected, it changes to Hide Columns and the
Hide Stats is replaced with Clear and Defaults.
|
|
In the column panel shown above, hover over a column name to see its brief description. Following
are the full descriptions of the columns.
General
|
Row |
| Row number within the table
|
Block |
| Synteny block containing this hit (if any). The format is Cn.Cm.Block#, where
Cn and Cm are the chromosome numbers.
|
Block Hits |
| The number of hits which comprise the synteny block.
|
Collinear |
| Collinear set containing this hit (if any). The format is Cn.Cm.Size.Set# (e.g. 1.2.5.100; there are 5
adjacent gene hits in set# 100 on Chr1 to Chr2).
|
Grp# |
| If Gene#, Multi-hit gene, PgeneF: These three queries produces groups of hits,
where each group has a group number. These numbers are generated during the search so will not be the same for different filter settings.
|
GrpSize |
| If Gene#, Multi-hit gene, PgeneF: Size of the group for the corresponding Grp#.
|
Hit# |
| The hit number, which are sequential numbers for the chromosome pair.
This number is shown on the Chromosome Explorer when the mouse is over the hit line.
| Hit %Id |
| Percent identity of the alignment. The value of the "Identity" column is from the MUMmer file.
If the hit has subhits, then this is an approximation.
| Hit %Sim |
| Percent similarity of the alignment (as determined by the BLOSUM scoring matrix).
The value of the "Similarity" column is from the MUMmer file. If the hit has subhits, then
this is an approximation.
| Hit #Subs |
| The number of subhits in a clustered hit.
| Hit St |
| If "=", both hit ends are to the same strand; if "!=", they are to different strands.
| Hit Cov |
| The length of the subhits within a clustered hit are summed taking into accounts overlaps; the longest summed subhits
of both sides is shown. See Clustered Hits in Terminology.
| Hit Type |
| There are two alternative algorithms for clustering the hits on database creation.
Algo1: g2 (two genes), g1 (one gene), g0 (no genes).
Algo2: E is exon, I is Intron, n is intergenic. There will be 2 characters, one for each project,
where the 1st letter goes with the alphabetically lesser project name; e.g 'EI' indicates the hit covers A.thal exon(s)
and Cabbage intron(s).
|
Gene&Hit Info: one row for each species
|
Chr |
| Chromosome of the hit.
|
Gstart/Gend/Gst |
| Start and end of the annotated gene. The Gst is the strand (+/-).
|
Gene# |
| The gene number is C.#.{a-z}. The C is the chromosome number.
The # is the sequential number along the chromosome.
If a run of genes overlap, they receive the
same gene number with different suffixes {a-z, a2-z2, etc}. This is shown on the Chromosome Explorer
"Annotation Description" and when the mouse hoovers over the gene.
|
Hstart/Hend |
| Start and end of the hit region.
| Hlen |
| Hend-Hstart+1
| Olap |
| The value depends on which Cluster Hit algorithm was used.
Algo1: If any of the project pairs used Algo1, then this column will be
the gene overlap for all project pairs.
Algo2: If all of the project pairs used Algo2, then this column will be
the exon overlap.
NOTE: For gene overlap, Algo2 takes into account gaps between the subhits, whereas
Algo1 does not.
|
Annotation: one row for each species
|
The keywords for the annotations of each species are listed;
they can be different for each species.
See Project Parameters
for modifying the keywords shown. The Anno Key Count can be modified at any time using symap (not viewSymap).
|
The single genes tables only have the
Gene Info and Annotation columns, with one additional column, as follows.
NumHits
| This is the number of hits to the gene in the ENTIRE databases, except for SELF synteny.
For example, if X, Y
and Z species have all been compared for synteny, but only Y and Z are being queried,
the Orphans for Y can have NumHits>0 if it hits X.
|
Single result from 3 selected species
| Single result from 2 selected species
|
In the above tables, the A.thal gene #2 is the 2-species table because it has not hit to Cabbage; but it is not in
the 3-species table because it does have a hit to A.thal. The values of NumHits is the same in both. That is,
they both are showing orphan genes relative to the selected set, but the NumHits lets the user know if its
orphan over all the computed sets.
The other query on singles is for All genes, which provides the same columns.
2.3 Auto-save columns
The columns selection is saved in a file called .symap_saved_props in the user's home directory.
If you only have one SyMAP database, there is no need to read the following:
Say the last database Dn queried had N species,
and the current database Dm has M species, where Dn and Dm are different
SyMAP databases:
- If N=M: the General and Gene&Hit Info columns will
be set according to the previous settings, but the Annotation columns may be wrong.
- If N>M or N<M: the General will be according to the previous settings,
all Gene&Hit Info columns will be set like the first Dn species, and no
Annotation columns will be set.
Sections
Show
Align
View 2D
Export
Report
Show: For the selected row, a popup will show all columns and associated information for the hit.
The text in the popup can be copied.
Align:
Select one or more rows. The sequences of the selected hit(s) are written out and a multiple alignment is created
using MUSCLE (Edgar 2004 NAR:32). The figure on the right shows the MUSCLE alignment of
two genes from B.rapa and genes from Cabbage and A.thal.
|
|
|
View 2D: This displays the 2D view for the selected entry. The following 4 views
are provided using the drop-down (the image on the right shows a Collinear set of 5).
Columns for 3 of the 4 options:
|
|
Option | Column* | Selected Hit | Display Filter
| Region (kb)
| N/A
| The hit is padded to each side by the amount indicated (default 50kb).
| Show all hits.
| Synteny block
| Block
| The entire block for the hit will be shown.
| Show Block Hits.
| Collinear set
| Collinear
| The collinear set for the hit will be shown.
Highlight Collinear Set will be automatically selected.
| Show Block and Set Hits.
| Group
| Grp#
| The group for the hit will be shown, where they are required to be on the
same chromosome pair as the selected row.
The hits are highlighted in the "Special" color, e.g. Example.
See High below for disabling the highlighting.
| Show all hits
|
*The selected row must have a value for the column.
After the initial display, the 2D view can be changed as described in the
User Guide.
If the High checkbox is selected, the selected hit is highlighted in the
"Special" color (default magenta), and can be changed
in the Color Icon.
The coloring can also be turned off by selecting the 2D Hit Filter Hit popop (or Query)
option.
Export...: One or more rows can be selected for the following exports;
or if no rows are selected, the entire table is exported.
CSV:
Export the rows using the selected set of columns to a CSV format suitable for import into Excel.
HTML:
Export the rows using the selected set of columns to a HTML format suitable for viewing on the web
(e.g. Example).
FASTA: (Pairs Only) Sequences from the rows are written to a FASTA file.
Both sides of each hit are written using the start/end coordinates shown in the table.
NOTE: this is a very slow function and takes minutes if many rows are selected.
|
|
The Include Row column option is available because this column is alway present, but
it may be desirable to not include it in the output.
Report...: The report is on the genes in the query table.
This is most relevant when used with the Collinear size query with >2 species.
If the table was generated using the Collinear size query, the report will be on the collinear sets per reference gene,
else it creates a gene report. In both cases, a reference must be selected and keywords are optional.
The SyMAP gene names are used in the report (which provide the chromosome number and give order of gene), but
other names can be shown by entering the appropriate keywords.
Gene Annotation Keys: One or more annotation keywords can be entered in a comma-delimited list (e.g. product, ID); the
keywords must be found in the All_Anno keyword column. A column will be created containing the values of all entered keywords.
SyMAP does not check the supplied keywords for correctness!
Collinear Report: Collinear sets are grouped to show overlapping sets.
|
| For example, in the above report, gene 4.b on Chr01 aligns to B.rapa gene 63.a on Chr01 in collinear set 95 with
7 members, cabbage gene 5521 on Chr08 in collinear set 1 with 3 members, and cabbage gene 79 on Chr05 in collinear set 459 with
9 members.
| Uncheck Show Collinear Set to not include the collinear set information.
|
Gene Report: The report lists all genes from the reference species, and what genes each aligns to
for the other species. Only genes listed in the query table will be shown in this report..
|
|
Statistics for the query results are shown at the bottom of the results table. They can be
hidden by selecting Hide Stats.
Most of the statistics are self-explanatory except the following:
Annotated and Genes: The first is the number of hits that
overlap one or more genes, where a gene can have multiple hits.
The second is the number of Genes with at least one hit.
Groups: This statistic
is only shown if Multi-hit genes was checked, and it is
the number of distinct groups per species (along with the last Grp# for the species).
Regions: This statistic
is only shown if PgeneF was checked, and it is
the number of distinct regions covered on that species.
All query results are listed under the >Results tab on the left. Clicking this tab shows the table of queries illustrated
above. Query results can be displayed by clicking a result on the left panel, or double-clicking it in the list of results table shown above.
The only way to remove query results from the left tab is by selecting them in this table followed by Remove Selected Query.
|