The following discusses building a SyMAP v5 database.
Contents
|
Supporting documentation
For Transcriptome Analysis and Comparative Transcriptomes, see TCW.
|
Overview and Publications
SyMAP is a system for computing, displaying, and analyzing syntenic alignments between
medium-to-high divergent eukaryotic genomes. Recent changes have improved
its performance for less divergent eukaryotic genomes (use Cluster Hit Algo2).
Its features include the following (for a pictorial introduction, see the
Tour):
Find synteny between two sequenced eukaryotic genomes with optional annotation.
| Draft sequence ordering by synteny, i.e.
align a draft genome to a fully sequence (not draft-to-draft).
| For multiple selected synteny pairs, display using dot plot, circular, and side-by-side.
| Complete queries on annotation, collinear genes, cross-species gene families, etc.
|
|
Click an image to see the closeup.
|
Publications
The back-end processing of SyMAP runs MUMmer1,2 for the alignments (included in the tarball) and computes
the synteny block from the alignment results. The SyMAP synteny algorithm is described in the following
two publications (though some unpublished updates have occurred since). A sketch of the algorithms is also provided here.
C. Soderlund, M. Bomhoff, and W. Nelson (2011)
SyMAP: A turnkey synteny system with application to plant genomes.
Nucleic Acids Research 39(10):e68.
C. Soderlund, W. Nelson, A. Shoemaker and A. Paterson (2006)
SyMAP: A System for Discovering and Viewing Syntenic Regions of FPC maps
Genome Research 16:1159-1168.
SyMAP is freely distributed software, however
if you use SyMAP results in published research, you must cite one
or both of the above articles along with the external program MUMmer1,2.
Steps for finding synteny
Follow the steps below to get started with SyMAP.
1.
| Use a Linux or MacOS machine.
| It needs to have Java v17 or later, and sufficient processing power.
See system requirements.
|
2.
| Set up MySQL.
| See MySQL.
|
3.
| Download SyMAP.
| Installation is a simple unzip. See installation.
|
4.
| Run the demo.
| Highly recommended. See running the demo.
|
5.
| Prepare sequences and annotation.
| Sequences can be in one or many files and can be masked or unmasked.
See preparing the sequences.
Annotation format is gff3; see annotation files.
|
6.
| Load the files into SyMAP.
| The SyMAP interface makes this easy; see
creating a new project.
|
7.
| Compute alignments and synteny.
| This is also easy through the SyMAP interface. See
runtime and memory.
|
8.
| View results.
| Detailed
description of the user interface is in the User Guide.
|
You must have Java version 17.0.10 or later, and MySQL or MariaDB. You must also have Perl for MUMmer.
The machine must be a 64-bit machine (32-bit will no longer work starting with v5.5.1).
The released symap.jar file has been compiled with Java 17.0.10, which is upward compatible.
If you need a version compile with Java 1.8, email symap@agcol.arizona.edu.
For performing large alignments (e.g. 1Gb genomes or more) it is essential
to have multiple CPUs and at least 5Gb of RAM for each CPU that you intend to use. Note that you can set the
number of CPUs for SyMAP to use.
See MUMmer
for information on insufficient memory.
For viewing alignments, CPU and memory needs are typically negligible, unless
you are performing queries on more than 4-5 genomes at once.
SyMAP has been tested on the following:
Machine | MySQL | Java | Core | Memory | Purchased
| v5.4.1 and later:
| MacOS x86_64 (Sonoma 14.4.1)1
| MySQL v8.0.33, MariaDB 11.0.2 | 8, 15, 17, 18, 20 from
Adoptium and Oracle
| 3.2 GHz 6-Core | 64 GB | 2018
| Linode (Ubuntu 22.04.2 LTS)2 | MySQL 8.0.33 | 17 | Nanode | 1 GB | 2023
| v5.4.0 and earlier:
| Linux amd64 (Centos)3 | MariaDB v10.4.12 | 1.8 | 2.3 GHz 24-Core | 128 GB | 2011
| MacOS x86_64 (Catalina 10.15.4) | MySQL v8.0.17 | 17.0 | 3.2 GHz 6-Core | 64 GB | 2018
| MacOS x86_64 (Maverick 10.9.5)4 | MySQL v5.6.21 | 1.8 | 2.4 GHz 5-Core | 16 GB | 2011
|
1 MacOS Sonoma has been tested with:
• Arabidopsis thaliana (119MB), Brassica Rapa (297MB) and Brassica oleracea (447MB);
• Prunus persica (227MB) with draft Prunus yedoensis (449MB);
• Chromosomes 5,17,X of Homo sapiens (420MB), Mus musculus (412MB) and Pan troglodytes (420MB)
and all chromosomes of Oryctolagus cuniculus (rabbit, 2GB).
2 Linode nanode was too small to run MUMmer, so the MUMmer demo result files were transfered to the
data/seq_results/demo_seq_to_demo_seq2 directory. This allowed all other features to be tested on a small database.
3 Linux amd64 was used extensively on large plant genomes, e.g. maize (2365MB), rice (400MB) and sorghum (730GB).
4 MacOS Maverick is an old laptop. I could not download from Github, so downloaded v5.0.8 from AGCoL. Then I had to replace
/ext/mummer/mac with /ext/mummer/mac_pre506. This machine is fine for viewing a SyMAP database,
but too small to process large input files.
Installation
Installation consists of unzipping the download tarball using the command
> tar -xf symap_5.tar.gz
This can be done anywhere and it creates a directory called symap_5. You can
move this directory later if desired. The contents are:
LICENSE README data/ ext/ java/
scripts/ symap symap.config viewSymap
Data: The data/ directory contains a /seq sub-directory,
which contains the demo files, and is the default location for all input sequence files.
Externals: The ext/ directory contains the external programs MUMmer1,2 (for sequence alignment)
and MUSCLE7 (for Queries).
The directory contains:
README mummer/ mummer4/ muscle/
For MUMmer, see Executables
and Using MUMmer4.
On MacOS, you may also need MacOS externals.
If you have not used SyMAP before, it is essential to run the demos. After you have
installed MySQL, do the following:
- Change into the symap_5 directory.
- Edit symap.config and enter database
and host information (see MySQL).
- From the command line, type ./symap.
The first time you run SyMAP, it will create the database with information written to the terminal, e.g.
Creating database 'symapDemo' (jdbc:mysql://localhost/symapDemo?characterEncoding=utf8).
It will check your MySQL variables; if there are any "Suggested" changes,
see Trouble Shoot MySQL.
It will also
check that the provided external programs (e.g. MUMmer) are executable; if it shows any problems,
see Executables.
For MacOS, you may also need MacOS externals.
Synteny between two genome sequences
| Go to top |
The Project Manager window opens showing the three demo projects provided with the SyMAP tarball.
Check Demo-Seq and Demo-Seq2.
|
|
|
A link Load All Projects will be displayed at the top of the right panel; select it to
load the projects, which will take several minutes.
If loading the Demo-Seq takes
more than a few minutes, you may need to adjust the MySQL parameters, see
TroubleShoot MySQL.
When done, the Manager will look like the image shown on the right.
In the Available Syntenies table, the cell for Demo_Seq2 and Demo_Seq will
automatically be selected.
Click the Selected Pair button to start the Alignment&Synteny.
The All Pairs button can also be selected, as it will perform Alignment&Synteny on all pairs
in the table.
|
|
|
The Alignment&Synteny takes less than 5 minutes on the MacOS 10.5 but could take up to 30 minutes
on a slow machine.
When done, the table will have a checkbox, signifying that the synteny is available for viewing.
|
|
|
Click Summary to view the summary shown on the right;
there may be slight differences in the #Cluster hits
because of different numbers of CPUs, MacOS vs Linux, etc (but the #Blocks come out the same).
To view the other interfaces, see Demo Results.
Once the alignments are computed, the Parameters can be experimented without
having to redo the alignments.
Try using the Algorithm 2 (gene-centric) for Cluster Hits;
bring up the
Parameters window
and select Algorithm 2 with its defaults. This
gives a summary similar to this.
|
|
Load the Demo-Draft project. Under the Demo-Draft listing, you will see
the parameter "Order against: demo_seq2". With this setting,
the Demo-Draft contigs will be ordered using synteny to Demo-Seq2; this was set in the Project Parameters window.
|
|
|
Run the Alignment&Synteny, where the alignment should take less than 30 minutes with one CPU.
When done, open the Summary for the pair, as shown on the right; as mentioned above, there
may be slight difference in the number of anchors.
See the first dot plot in Demo-draft.
It is recommended that the Cluster Hits algorithm 1 be used for ordering sequence.
|
|
The ordering algorithm changes the order of the draft contigs in the database, but does
not change the sequence files on disk. However, it writes the following files:
1. File of ordered contigs: It writes the order of the contigs along with
whether they should be flipped to a file called /data/seq/demo_draft/ordered.csv.
2. Fasta files of ordered sequences:
It creates sequence files from the ordered contigs that are flipped when appropriate,
which are put into a new project
with the suffix "_ordered", as shown in the image below. The chromosome names correspond
to the order-against project (e.g. demo_seq2), and the third chromosome is 'chr0'
which contains all draft sequences that were not placed.
As shown in the image on the left, a new project has been added. Click Demo_Draft-ordered and
load it.
Running the Alignment&Synteny
between the _ordered project and Demo_seq2 will provide a more coherent display
(see the second dot plot for Demo-draft).
Using the project Parameters window,
the Demo_Draft-ordered name can be shortened.
| |
|
Self alignment
The section on Self alignments discusses the self-alignment of the Demo_Seq project.
The functionality of all links and button on the Project Manager
are listed
in Quick Quide; the following provides details.
MySQL and parameters
If your machine does not have MySQL or MariaDB, download and install it. For example,
MySQL can be downloaded from dev.mysql.com.
On a personal MacOS, simply download the '.dmg' file and following the instructions.
On a work server, the system administrator may need to install it.
Important Note: The default settings of MySQL are poorly suited for large-scale
data storage. You will want to adjust the parameters
innodb_buffer_pool_size and innodb_flush_log_at_trx_commit as described in
Trouble Shoot MySQL.
Parameters for accessing the MySQL database
should be set in the symap.config file in the main symap directory, as follows:
Database Parameters
|
db_name
| Name of the MySQL database, which SyMAP will create when it first reads
symap.config. It is standard to start the name with symap,
e.g symapDemo.
|
db_server
|
The machine hosting the MySQL database, e.g. myserver.myschool.edu. If using
your local machine, enter localhost.
|
db_adminuser
|
MySQL username of a user with sufficient privileges to create a database. It is also
necessary for loading, deleting and running synteny.
|
db_adminpasswd
|
Password of the admin user.
|
db_clientuser
|
MySQL username of a user with read-only access. This is only necessary if you
want a machine to run viewSymap as read-only.
|
db_clientpasswd
|
Password of the client user (if db_clientuser is non-blank).
|
Example symap.config.
db_name = symapDemo
db_server = localhost
db_adminuser = <adminid>
db_adminpasswd = <password>
db_clientuser =
db_clientpasswd =
To use an alternative file than symap.config, use the "-c" command line argument, e.g.
>./symap -c symapTmp.config
This is useful if you have multiple SyMAP databases.
Runtime and Memory
If SyMAP runs out of memory, see Trouble Shoot.
If MUMmer runs out of memory, see MUMmer.
The largest component of SyMAP execution time is in running
MUMmer1,2. The time and memory for MUMmer all depends on the size
of the genomes.
For example, to align rice (12 chromosomes, 370Mb) to maize (10 chromosomes, 2Gb)
required 1 hour and 3 minutes using 8 CPUs with 2.3Ghz speed.
The memory usage of MUMmer is typically 5G per CPU, however it can
be as high as 10G for very long or repetitive chromosomes. If MUMmer fails, it is often due
to insufficient memory, see the MUMmer document, which
explains how to determine the problem and ways around it. It also explains the
CPU and Concat options. It also explains how
to run MUMmer on a different machine and port the results to the symap_5/data/seq_results directory.
Start symap (i.e. ./symap).
The sub-directories of /data/seq will be listed on the
left panel under Projects. You may create the project as discussed in Directory
structure, or you may create a new project via the SyMAP interface.
To create a new project via the SyMAP interface, press the Add Project button
at the lower left. Enter the project name beside Name:.
This is the directory name for the project. It should be a short and unique name containing only letters,
numbers, and underscores.
In the Project parameters, you will be able to add a more descriptive display name.
|
|
After saving the new project, it appears in the Projects list on the left, but
it is still an empty shell. A directory will be made under the /data/seq,
e.g. for the project added on the right, a directory will be created called /data/seq/foobar.
Check its box and it will appear in the Selected section (right hand side).
|
|
SyMAP Parameters
The sequence file(s) must be FASTA format with one or more sequences.
For the FASTA format, the name of a sequence is the string immediately following the ">", e.g.
>Chr3 Oryza Sativa
GAATTCGAATTTGGGTAATGCTAATCAATACAGGTCAAAATCTATGTATTGAGTGGAATATACTGCAAAGTAATTACCTT
CTTCCAAAGGAAAGCATTCCTTCTCTCTTGTGGGACTAGCAGATGATCTCGCAGCCAAGACGTGACCACCCAAGGCTCAC
...
In this example, the sequence name is "Chr3" and the Chr3 sequence follows. The additional information
"Oryza Sativa" is ignored.
The first decision with whole-genome sequence is whether to used repeat masked sequences.
- Masking reduces alignment time and false-positive hits,
but also runs a risk of concealing true hits due to inaccurate masking.
- SyMAP does not perform the repeat-masking so must be done with another program.
However, you may obtain masked sequences from NCBI or Ensembl.
NCBI provides soft-masked sequences where the
scripts/ConvertNCBI script will convert it to hard-masked, and Ensembl
provides both soft and hard-masked sequences (see Convert).
-
Masking is not really necessary unless the genome is highly repetitive
and those repeats are shared with other genomes being aligned.
(Repeats cause particular trouble for self-alignments,
see self-alignments in SyMAP).
- Occasionally, MUMmer fails aligning sequences
(see MUMmer, which explains detection and solutions
for known reasons for failure).
Another masking option which is available if you have gene
annotation is to mask out everything but the annotated genes. You
can enable the mask_all_but_genes option on the project's Parameter window;
turn it on before doing the alignments.
Important points in naming sequences for SyMAP:
A.
|
Sequence names can only contain letters, numbers, and underscores.
| B.
|
The sequence names must exactly match those
used in the annotation files (first column), or the annotations will not be loaded.
| C.
|
Use a consistent prefix such as "Chr" for all sequences,
then set 'Group prefix' to the prefix in project's Parameter window.
| D.
|
If there is not a consistent prefix, you may leave the 'Group prefix' blank;
beware, this can have unintended results, so should be avoided if possible.
Make the names short so they will not clutter the display. You may
need to rename your sequence, in which case, it must be done in the FASTA
and GFF files.
|
Annotation files should be in
gff3 format.
The first column (seqid) must exactly match the sequence names in
the FASTA files. The third column (type) determines how SyMAP
uses the entry. Types "gene", "exon", "centromere", and "gap"
are recognized (other entries are ignored). Exons are entered in the order they are found.
The last column (attributes) contains "keyword=value" pairs describing
the annotation.
- For genes, all attributes are saved in the database for viewing.
You can set which attributes keywords to save; open the
project's Parameter window, look for parameter Anno keywords.
- For exons, the "parent" keyword=value is saved for viewing.
Note: As of v5.1.8, the genes and exons must be in one file.
NCBI files: A Java script (scripts/ConvertNCBI.class) has been provided that
converts NCBI genome FASTA files and gff annotation files
into the format that works best with SyMAP.
See the documentation for instructions.
Ensembl files: A Java script (scripts/ConvertEnsembl.class)
has been provided that converts Ensembl genome FASTA files and gff3 annotation files
into the format that works best with SyMAP.
See the documentation for instructions.
If you are ordering the draft sequence against a closely related sequenced genome, see demo draft on how to proceed.
It is strongly suggested you run the run the demo! In a nutshell, the steps are:
- Load both sequences.
- For the draft,
bring up the Parameters window. Beside the Order against row is a
drop-down of all loaded projects; select your whole genome project.
- Run the Alignment&Synteny. At the end, you will see a new project with the suffix
_ordered in the left panel. It will contains:
- A sequence directory with a ".fa" FASTA file. Any scaffolds matching the Order against
genome will be assigned the same chromosome name. All scaffolds aligning to an Order against chromosome
will be appended together in order with 100 N's between each scaffold.
Any extra sequences will be put in ">Chr0".
- An annotation directory with a ".gff" file that specifies
where the gaps are.
- Load the _ordered project. Then run Alignment&Synteny between the whole genome
project and the new _ordered project.
The original draft directory will have a new file called ordered.csv that will contain
the merged contigs and what chromosome number they are mapped to.
If the draft sequence is in too many sequence pieces,
(1) it takes a long time for the MUMmer comparisons, (2)
the display is very cluttered, and
(3) the blocks display does not work right. Limit the number of sequence pieces
by setting Minimum length in the project's Parameter window to only load the largest 150 sequences;
there is a script called scripts/lenFasta.pl which
will print out all the lengths; set the Minimum length to the 150th length. However, even 150 are a lot of blocks to view
so you might want to start with the largest 50, merge them, then repeat.
To perform self-synteny, select the cell for the same project followed by
Selected Pair. The All Pairs option does not include self-synteny.
|
|
By default, SyMAP uses the MUMmer 'NUCmer' program for self-alignments. Each chromosomes
is compared to every other chromosome including itself.
The Alignment&Synteny Parameters window has an option to set Self Args,
which is only used when comparing the chromosome sequence file
to itself. Make sure that the Algorithm 1 option is selected.
The self-synteny of demo_seq shows a few tiny blocks. A better demonstration is to download
Arabidopsis thaliana from NCBI, convert it with the
NCBI convert script, and run the self synteny. It took 16 minutes with one processor on a Mac Mini (2018) with
64GB main memory. The dot plot is shown on the right (click on the image for a closeup view).
|
|
The "--maxmatch" parameter was the SyMAP default, but now it is up to the user as to whether
they want to add it to Self Args (it can greatly increase the execution time).
Reasoning for using "--maxmatch": MUMmer ordinarily seeds its alignments with unique matches,
which eliminates the possibility of off-diagonal seeds in the alignment of a chromosome to itself.
To overcome this problem, individual chromosome self-alignments can use the MUMmer parameter -maxmatch,
which removes the uniqueness requirement at the cost of greatly increased noise. The extra noise is then
filtered to a large extent by the default SyMAP filters, but the diagonal squares of the dot plot will
still have more noise visible than the off-diagonal.
The Load and A&S methods have a popup progress window, as shown on
the right. There is a Cancel button on the bottom that can be clicked to cancel the execution;
it will remove the results from the database and exit.
Occasionally, the Cancel will cause it to create an error, writing to the error.log
or to the terminal. This is not a problem, though you may need to remove the results yourself.
If MUMmer is running when you Cancel, make sure there is "Error: Failed command:" line
to terminal for each MUMmer alignment that was running; if there is not, use the "top" linux command to
view the running processes and stop any MUMmers still running.
|
|
General
For SyMAP v5, the default location for sequence and annotation files is:
/data/seq/<project-name>/sequence
/data/seq/<project-name>/annotation
You may do one of the following:
- Create these sub-directories under /data/seq and put your files there.
- Create these sub-directories under /data/seq and use soft links to point to the file locations,
e.g.
cd data/seq
mkdir foobar
cd foobar
ln -s <location of directory of sequence files> sequence
ln -s <location of directory of annotation files> annotation
- Use the project parameter window to enter the location of the sequences and optional annotation.
For options 1 and 2, it is not necessary to enter the locations of the files in the parameter window since
both use the default locations. Beware, executing the "Remove project from disk" (depending on
location), can remove the data files also.
The file results of the alignment and synteny computations are is as follows:
/data/seq_results/<project-name1>-to-<project-name2>/align
/data/seq_results/<project-name1>-to-<project-name2>/final
After the database is complete, these can be removed. However, sometimes SyMAP version updates require
the project files to be reloaded and/or the synteny to be recomputed; if these files remain, the existing
MUMmer files will be used, which saves a lot of time.
The log files are in the /logs directory, see
Running MUMmer for more details.
External programs, MUMmer and FPC
External programs
See Installation.
MUMmer with SyMAP details
All MUMmer details are in a separate document,
see MUMmer.
This includes trouble shooting when MUMmer fails, and running MUMmer outside of SyMAP.
For working with FPC8,9, it is suggested you use release v5.0.8 from
AGCoL.
- It has the FPC demo files.
- It has BLAT3 in the /ext directory.
- It has the tar file doc.tar.gz of the documentation.
- The AGCoL documentation applies to this release. See AGCoL
System Guide.
- This is the last release made from AGCoL, so the documentation will stay consistent.
- It is also available from Github.
If you run into any problems, please do not hesitate to contact symap@agcol.arizona.edu.
This section provides a brief overview of the SyMAP processing steps; for
more, see the SyMAP published papers5,6. The processing
has four phases:
Alignment:
The sequences are written to disk *, with gene-masking
if desired. In the alignment, one species is "query" and the other is
"target".
The query is the one with alphabetically the first name. The query sequences
are written into one large file, while smaller target sequences are grouped
into larger FASTA files of size up to 60Mb, for more efficient processing
in MUMmer. There is an option Concat that if unchecked, the query sequences
are treated the same as the target; i.e. generally there will be more sequence files to
processed, but they will be smaller.
Anchor Clustering and Filtering:
The raw anchor set consists of the hits found by MUMmer, which are filtered and clustered for input to the synteny algorithm.
Algorithm 1 (original) is good for medium-to-high divergent genomes, aligning draft sequence, self-synteny, and genomes with little or no annotation.
The MUMmer hits are first clustered into gene, or putative-gene hits. This is
done by clustering the hit regions on each sequence, and then defining new "gene"
hits which connect these regions. For example if three separate
exons hit between two genes, they will be clustered into one "gene"
hit having a combined score equal to the sum of the raw hit scores.
Clustering is by gene if the hits overlap annotation, otherwise, it creates
"candidate genes" from hits that do no overlap annotation.
The clustered "gene anchors" are then filtered using a version of
reciprocal-best filtering which is adapted for retaining duplications and
gene families. For each pair of genes (or putative genes) which is
connected by a clustered anchor, the retained anchors must be among
the top two anchors by score on both sides (top-2 allows for one
ancestral whole-genome duplication). An anchor will also be retained if its
score is at least 80% of that of the 2nd-best anchor on each side (this
allows for retention of gene family anchors). These filter parameters
may be adjusted through the Alignment&Synteny Parameters window.
Algorithm 2 (gene-centric) is good for low-to-high divergent genomes with good annotation. It directly maps hits
to the exons and introns. Hits aligning to un-annotated regions are clustered separately. There are many more parameters for
this approach, as the hits are filtered based on the parameter values.
Synteny Block Detection:
After the clustered anchors are loaded into the database, the synteny
synteny block algorithm runs. This algorithm looks for approximately-collinear
sequences of anchors, subject to several parameters including (A) Number
of anchors; (B) Collinearity of the anchors; (C) Amount of "noise" in the
surrounding region (to help reject false-positive chains). Criterion A can
be adjusted in the Alignment&Synteny Parameters window.
* Note that the sequences are re-written from the database to the
disk for three reasons: (A) To allow re-grouping for efficiency; (B) To ensure elimination
of invalid characters; (C) To mask non-gene regions, if desired. This also ensures that
sequences names will match those in the database, and prevents problems caused by
moving the source sequences on disk.
How to update SyMAP with a new release
| Go to top |
If you have been working with SyMAP and have existing projects, you can update to a new SyMAP version by
downloading SyMAP, and:
- Put symap_5.tar.gz in a permanent location and untar it.
- Replace the /data and symap.config from your previous SyMAP location to this new location.
- This approach is safest as it acquires all changes (e.g. scripts) except for changes to the demo files.
or
- Put symap_5.tar.gz in a temporary location and untar it.
- Move symap_5/java/jar/symap.jar to the java/jar location of your permanent SyMAP.
- Check to
see if there are any /scripts or /ext changes that need to also be copied over.
The following table shows what versions require action by the user.
Release | Changed files1 | Action by User
| v5.4.6-5.4.8 | Only the symap.jar | Alignment&Synteny2
| v5.4.1 | Only the symap.jar | Execute ./symap -y
| v5.4.0 | Only the symap.jar | Alignment&Synteny2
| v5.2.0 | symap, viewSymap, scripts/ConvertNCBI.* | Alignment&Synteny2
| v5.1.9 to v5.1.7 | Only the symap.jar | See 3. If you have customized hit or gene colors, you will need to reset them.
|
- Always get the new java/jar/symap.jar.
- The Alignment&Synteny will use existing MUMmer files if they have not been removed.
- To update to the latest Gene# assignment, it depends on what version you have:
v5.1.2 to v5.1.7 | Execute ./symap -z, then select Reload Annotation; the synteny algorithm does NOT need to be re-run.
| pre-v5.1.2 | Delete database, Load Project and run Alignment&Synteny. The database does not have
to be deleted, but its cleaner to do so.
|
1
Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., Salzberg, S.L.
(2004). Versatile and open software for comparing large genomes, Genome Biology, 5:R12
2
Marcais, G., A.L. Delcher, A.M. Phillippy, R. Coston, S.L. Salzberg, A. Zimin (2018).
MUMmer4: A fast and versatile genome alignment system, PLoS computational biology, 14(1): e1005944.
3 Kent, J. (2002) BLAT--the BLAST-like alignment tool, Genome Research 12:656-64.
4
Krzywinski, M., J. Schein, I. Birol, J. Connors, R. Gascoyne, D. Horsman, S. Jones, M. Marra (2009).
Circos: An information aesthetic for comparative genomics. Genome Research doi:10.1101/gr.092759.109.
5
Soderlund, C., Nelson, W., Shoemaker, A., and Paterson, A.(2006).
SyMAP: A system for discovering and viewing syntenic regions of FPC maps.
Genome Res. 16:1159-1168.
6
Soderlund, C., Bomhoff, M., and Nelson, W. (2011).
SyMAP: A turnkey synteny system with application to multiple large duplicated plant sequenced genomes.
Nucleic Acids Res V39, issue 10, e68.
7
Edgar, R (2004).
MUSCLE: a multiple sequence alignment method with reduced time and space complexity.
BMC Bioinformatics 113.
8
Soderlund, C., S. Humphrey, A. Dunhum, and L. French (2000).
Contigs built with fingerprints, markers and FPC V4.7. Genome Research 10:1772-1787.
9
Engler, F., J. Hatfield, W. Nelson, and C. Soderlund (2003).
Locating sequence on FPC maps and selecting a minimal tiling path. Genome Research 13:2152:2163.
|