AGCoL TCW Release Notes UA
BIO5
TCW Home | Download | Docs | Tour
There have been many improvements, but the following provides some recent critical points.
v4.0.429-July-22AllSmall improvements.
v3.3.927-Dec-21runASImproved interface for full subset UniProt; updated Demo.
v3.3.504-Nov-21viewSinglelog2FC TPM analysis
v3.3.418-Oct-21runSingleORF finder improvements
v3.3.101-Sep-21runSingle Add Prune Hits algorithm
v3.3.015-Aug-21AllUpdate Diamond and Blast and default parameters
v3.2.720-Jul-21viewSingleAdd enhanced Decimal Display (highlight p-values)
v3.2.504-Jun-21runMultiFix orthoMCL to work with MariaDB
v3.2.309-May-21viewSingleEnhance the Hit-GO evidence codes query and display
v3.2.109-Apr-21runDEThe built-in GOseq was changed to any script that is run by runDE.
v3.1.818-Mar-21runASChanged the source of GO to the go-basic.obo file
Earlier2020  

v4.0.4 29-July-2022

The last tar file had "._" MacOS files, which have been removed from this release.
  • Overviews: The HTML Overview files conform to HTML5.
  • Startup: All programs show the same banners on startup, including memory and Java version.
  • Execute: All programs clearly show log file and size of it.

v4.0.3 14-July-2022

runDE
  • The Close and Exit did not work quite right in either the runDE Database Chooser or runDE window. These have been tidied up with additional Help added.
runSingleTCW
  • ORF finder:
    1. Really tiny bug: Very occasionally, it was assigning ORF=Hit remark incorrectly when ORF=Hit-3 and there was one AA at the end instead of stop codon (e.g. demoTra, tra_005).
viewSingleTCW
  • Seq-Hit align:
    1. The hit-ORF overlaps are highlighted in aquamarine (this happens when there is a stop codon in the hit region).
    2. Really tiny bug: The 3'UTR was starting one early sometimes.

v4.0.2 7-July-2022

Most documentation is in the TCW desktop applications except for the extensive documentation for building a single or comparative TCW database, which has been moved from www.agcol.arizona.edu/software/tcw to csoderlund.github.io/tcw. Any references to the online documentation within the code has been updated.

The html Help files can now have URL links in them.

v4.0.1 27-Apr-2022

runMultiTCW
  • The cluster score for a Sum-of-Pairs has been changed to the sum-of-comparisons/#comparisons, where the #comparisons = (nSeqs*(nSeqs-1)/2) * nCols
  • Fixed a potential bug: if the MSA consensus sequence was very long, a MySQL error would occur.
viewMultiTCW
  • Fixed a potential bug: the pre-computed MSAdb would not display if score1<0.
runSingleTCW
  • Description prune update: (1) Was taking the one with the most GOs, even if the bitScore was less; now only takes the one with GOs if the other has none and bitscores are close. (2) If there was a "{...}" at end of description, it was not being removed before finding unique descriptions.

Scripts

  • Added/changed a few scripts that are used for results in the next version of the BioRxiv publication

v4.0.0 22-Mar-2022

General

  • Documentation and Help updates - mostly for mTCW.

runAS

  • The Check function will write to the terminal the UniProts added to an existing goDB. For existing goDBs, they will be updated on first Check and stored in the goDB.

viewMultiTCW

  • Explain on Overview: rewrote much of it for clarity.
  • A lot of little tidy up things. (1) Overview: removed "(Avg %Sim)" on AA and NT pairs. Slight change to KaKs counts. (2) Pair Filter: If KaKs set to be at least >=0.0, got Kaks with no values or NA. If pairs&clusters removed, pairs added but no clusters, the pairs filter crashed. (3) Sequence Table: check to see if it is !NTonly database, if so, do not give options of Export CDS/NT sequence. If !GO, do not give option of exports GOs.
  • Details: Add Copy... option to copy the displayed Detail sequence.

v3.4.2 8-Mar-2022

A few improvements and tiny little bug fixes.

General

  • Made all button colors consistent (see Colors).
  • Put all Help buttons on the upper right corner.
  • Updated all documentation snapshots.

runSingleTCW:

  • ORF Finder
    • A few adjustments to the heuristics for choosing largest ORF versus best Markov score when there is no good hit.
    • There will be less ORF=Hit and more ORF>Hit; this is because if a hit does not have a Start or Stop codon, more ORFs are being extended then previously.
    • Previously, if a hit ended at a Start codon but did not have a Stop, it would look upstream for the furthest 5' Start. Now it uses the Start at the 5' end of the Hit.
    • A few changes to the ORF summary in anno.log.
  • Fixed slight round-off error for Overview ANNODB AVG %SIM.
    To regenerate overview, execute ./viewSingleTCW <project name> -o.

viewSingleTCW:

  • Overview: Improved the Reproduce instructions.
  • Seq Table: If the Markov Score column was displayed, the Show Column Stats gave all 0's for its statistics.
  • Seq Details: (1) Added Help for the Details panel. (2) The file append was not working.
  • Basic AnnoDB Hits: The %HitCov value was sometimes off by 1.
  • Basic GO Annotation: The default minumum level to show was >= 1. Level 0 is obsolete terms, which do occur, so the minimum level was changed to 0.

runMultiTCW:

  • Remove '--sensitive' from the DIAMOND default parameters.
    This makes a slight difference in the orthoMCL results, so projcmp/ex/orthoMCL.OM-4 is updated.
  • The Remove "Clusters and Pairs" is faster.
  • For an AA-mTCW, the pairs AAlen1 and AAlen2 were getting length of 0, and now is the correct length.
  • The MSA redo had quite working in v3.1.3 and has been fixed.

viewMultiTCW:

  • If a sequence had no hits, an error was written to mTCW.error.log (but it worked).

v3.4.1 22-Jan-2022

viewMultiTCW

  • List Results: Multiple rows can be selected for removal.
  • Cluster Table: The MSA... option for aligning the AA sequences of the cluster includes the best hit.
  • Hit Table: Has a new option for Pairwise... that aligns all nSeq sequences against each other, of the row HitID against all nSeq sequences (where nSeq is the number of sequences with the hit).
  • Sequence Table: Has a new option MSA... that allows sequences from different clusters to be aligned together.
  • Sequence Details: Has a new option Pairwise... that aligns the detailed sequence to one of its sequence pairs or hit sequence.
  • All tabs on the left and summaries have beens standardized.

v3.4.0 14-Jan-2022

Colors: the various buttons were color coded on Linux, but not on Mac; moreover, their coloring was inconsistent. Now various buttons are color-coded on Linux and Mac, and represent the following:

beigeruns an algorithm
light greenreplaces the current panel
rosehelp
lavendernew window or popup
light grayrequest for input file
light bluetab on left of viewSingle and viewMulti.

runMultiTCW

  • Loading a project with an existing mTCWdb is much faster.
  • Change to Overview to put Methods table in Processing section.
  • Best Hit: made the output to the terminal and Help more obvious in regard to the algorithm.

viewMultiTCW

  • The pairwise and MSA alignments have been moved from the sequence table: (1) the Pairs table has pairwise alignments and the (2) Cluster table has MSA alignments.
  • The alignments create a new tab on the left underneath the respective table.
  • All tables created from selecting one row from another table have Prev/Next buttons. The summary of displaying these row tables has been improved and a few little bugs fixed.
  • Run MSA on existing cluster: (1) Now shows the correct scores. (2) Shows the type of alignment (e.g. MAFFT CDS).

runSingleTCW

  • Overview: When writing to file, the filename is the database name instead of the sTCWid (the sTCWid is not necessarily unique).

Software: (1) All alignment routines have been moved from seq.align to align directory. A new file called AlignButton.java has the code for the alignment buttons. (2) All color codes have been consolidated in the methods/Static.java file.

v3.3.9 27-Dec-2021

Share: All interfaces that use Blast and Diamond have access to a Help page of parameters.

Demo: The demo has been updated to Dec 2021, and updated as described in #3 below.

runAS

  1. For the Full SwissProt and TrEMBL, there is now the option of creating the different subsets of the downloaded taxonomic databases and the complete SwissProt and/or TrEMBL. See Full subsets for an explanation.
     
  2. Both adding UniProts for the runAS Build GOs and runSingleTCW Annotate is updated to better handle duplicate SwissProt hits (which can happen if the full SwissProt is used along with taxonomic databases).
     
  3. The demo sp_fullSubset is renamed to sp_full, which contains the files uniprot_sprot.dat, uniprot_sprot_xBFxIxPxxx.fasta that has the bacteria, fungus, invertebrate and plant SwissProt entries removed, and uniprot_sprot_xxxxIxxxxx.fasta that has the invertebrate only removed.

v3.3.8 12-Dec-2021

runSingleTCW

  • Annotate:
    1. Loading FASTA files:
      • Speedup - cut the time in half on Linux.
      • On Mac, was running out of memory loading TrEMBL plants, which has been fixed.
      • UniProt FASTA headers: Changed the rules for parsing FASTA headers to better fit UniProt 2018 descriptions.
    2. Uninformative descriptions: The rules for computing uninformative descriptions have slightly changed. Though the rules should strictly adhere to the "International Protein Nomenclature Guidelines" bullet item "where no domain or motif is observed", many other uninformative descriptions seemed to be entered. The rules can be changed in the file "util/methods/BestAnno.java".
    3. Unique Description (displayed in Seq Detail): Slight changes.

viewSingleTCW

  • Basic AnnoDB Hits: The Add has been significantly speeded up.
  • Basic AnnoDB Hits and Sequences: The Delete selected and Keep selected were not displaying the table correctly (though corrected itself on any sort, etc).

TCW package

  • The "/doc" directory has been made into a tar file with instructions in the README on how to use it.

v3.3.7 30-Nov-2021

viewSingleTCW - minor changes

  • Improved some of the text of the Help popups.
  • Basic Filters: buttons are made inactive when they cannot be used - this was not done consistently, but now is.
  • Basic GO: The "ADD" function would add duplicate GOs.
  • Seq Details: (1) Change to only highlight top drop-down buttons if an item is selected. (2) Speeded up Seq hits - inherited.

v3.3.6 22-Nov-2021

This release has improvements to the Basic Filters (Seq, Hit, and GO).

viewSingleTCW

  • All Basic
    • Help has been split into a single Help.. drop-down.
    • New They all have a new feature that highlights the selected rows in the table.
    • The information on the status line above the tables is consistent and more informative.
  • Basic GO:
    • New Add Select related in table, Select ancestors in table, Select descendants in table to the Show button.
    • New Add Select terminal terms for the Table... button; this selects all rows that do not have descendants in the table.
    • Select query was implemented, which executes the query and selects the intersecting rows.
    • Add Row number column.
  • Basic Sequences:
    • The SELECT ROWS feature has been moved to the bottom set of buttons and called Select Query.
  • Minor fixes: (1) Use "bit-score" everywhere in Help. Otherwise use bit-score. It was different everywhere.
    (2) Basic GO: Was not disabling "#Seq" on Enrich line when Search was selected. Was not checking for a valid p-value. Was showing a level 0 (obsolete) when the range started at 1. Exports did not work right if GOid was not displayed.
Go to top

v3.3.5 4-Nov-2021

The main change from this release is a feature to allow log2FC analysis besides FC.

runSingleTCW

  • Changes to ORF finder: A few small changes to the output to anno.log to make it clearer.
viewSingleTCW
  • Sequence Table: Filter and Column:
    • The N-fold column and filters are given the option of displaying and filtering on the log2FC or just the FC.
    • Bug fix: Differential Expression: The "Any" for "Up" would only show up-regulated for all selected DE columns that had at least one <p-value; that restriction has been removed. Same for "Down".
  • Basic Hits: Seq Table, Delete Selected, Keep Selected for "Group by Hit ID"":
    • If there were many seq-hit pairs, these functions could take a very long time - its faster now, but can still take a long time (a popup will advice the user now).
    • Tiny rare bugs: (1) The Clear All did not work if there were no counts for the sequences. (2) The hit 'best align' value on rare occasions was wrong.
  • Seq Detail: - minor change
    • For Align Best Hits, it now aligns all best hits even if they are not displayed in the hits table (e.g. if there is a different Best Bits and Best Anno, and Distinct Regions is displayed, one of the bests may not be in the table, so it would not be aligned).
Go to top

v3.3.4 18-Oct-2021

This release is improvements for the ORF finder.

runSingleTCW

  • Changes to ORF finder:
    • Algorithm: (the changes are fine-tuning results, no major changes)
      • For Markov training: it was using all sequences for training. Now it uses the longest N sequences (default 2000), after removing similar sequences. N can be changed by executing execAnno with the -t option.
      • Sort: (1) Instead of comparing the Markov scores, it tests against (Abs(log(score1)-log(score2)>0.3), where negative Markov scores are -(log(-score1)); (2) If the length and Markov scores are similar, a 4th rule checks for ends (Start & Stop codon).
      • With Hit: If the hit ends at start/stop codons, always use those coordinates. Otherwise, it finds all possible ORFs (including Stop to Stop), and sorts for the best ORF.
      • Stop codons: If there was stop codons in the hit, it was not always finding the best coordinates - now it does.
      • No hit: ORFs that are Stop to Stop may now be considered if the Stop is far enough from the last Start.
      • N's in sequence: It use to try to avoid N's, now it does not. However, it does remove them from the length before taking the log for comparing lengths.
      • Minimal sequences for Markov training: The default was 50, which is way too low. It is now 500.
    • The selected ORFs that do not have both Start and/or Stop will have a remark.
    • The output files are now sorted by SeqID.
    • Previously, allGoodORFs.pep.fa + bestORFs.pep.fa provided all candidate ORFs; now all candidate ORFs are in allGoodORFs.pep.fa.
viewSingleTCW
  • Load File for all 3 Basic filters: only the first word per line will be read as a SeqID, OrigID, HitID or GOID. This allows files to be used that have other information on each line.
    • Basic GO annotations: Add "#" before column headings so an exported file can later be read in (The other Exports already do this).
  • New Sequence Table has new Export option to output the columns of the selected row.
  • Basic Sequence:
    • New Select one sequence from the table followed by Seq Detail to see the Sequence Detail panel. This is in contrast to the Seq Table ,which results the sequences being shown in the Sequence Table.
    • New The result of a search will SELECT ROWS from the existing table.
  • Results: This panel now shows all Sequence Detail labels from the left panel so that all results can easily be removed.
  • Changed a few labels, e.g. the "View Seqs" label to "Seq Table", and "View Selected Sequence" to "Seq Detail"
Bug fixes:
  • ORF finder: Sequence of length 0 crashed the ORF Finder.
  • Basic GO annotations: the #Seqs quit showing the number of DE seqs correctly (bug from v3.3.3).
  • Filter ORF Frame: Only worked for positive frames.
Other
  • demoTra: add N's to a few of the sequences.

v3.3.3 25-Sept-2021

viewSingleTCW - tiny fixes
  • Exports now write the correct filename of output to the terminal.
  • Sequence Detail - Frame: The Y-axis coordinate has been changed so that it can be added to the X-axis coordinate to get the last base of the respective codon.
  • Sequence Detail - Align:
    • The highlight UTR would incorrectly extend over a hit overhang.
    • The highlight HIT included one extra AA.
    • Trim showed an extra AA on the 3' end.
    • When the "Hit" was highlighted for a negative frame, the coordinates were often off by 1.
  • Main Table "Export GOs from Table" did not work if there were no GOseq values.
  • Basic GO: The "Show" was recently broken, and has been fixed.
Go to top

v3.3.2 15-Sept-2021

runSingleTCW

  • Various features use "Rank=1", which was not being updated when pruning was applied.
  • "Remove Annotation" was not clearing the GOseq values totally.
viewSingleTCW
  • Pair Alignment: if multiple pairs are shown, the alignment will start in the same place across all alignments.
  • Verified all "Reproduce" information from "Overview" and improved the description.
  • AnnoDB Hits bug fix from v3.3.0: Filtering on "%HitCov" stopped working.
Go to top

v3.3.1 1-Sept-2021

The singleTCW database has a small schema update that will be applied the first time an existing database is viewed.

runSingleTCW -- Annotate

  • sTCWdb version sdb6.0: The percent similarity (identity) is stored as a real number instead of an integer.
  • NEW Prune Hits - there are many hits with the exact same coordinates and/or descriptions. A new function removes all but the best based on same alignment values or same description. This function can be set to run in the AnnoDBs Options or from the command line.
  • Tiny changes;
    • DIAMOND - removed '--max-hsps 1' as TCW defaults because it is a DIAMOND default. Removed '--top 20' as this misses some good descriptions.
    • TCW was only loading 25 hits per annoDB. This restriction has been removed since the user can set the limit in the search programs parameters.
    • This really isn't a problem, but I fixed it any way as it could be confusing: If there were multiple hits with the same bit_score and E-value, it was using the one with the highest rank for the "Best Bit" assignment, which was changed to lowest.
    • Tiny bug fix in Multi-Frame: If a sequence had an NT hit, it was typically incorrectly marked as 'Multi-frame'.
viewSingleTCW - tiny adjustments
  • Sequence Detail:
    • A new Show button shows all columns for a selected hit (the Hit Table only provides a subset of all columns).
    • The Hit Table now sorts on bit-score, e-value, %sim in that order (the addition of %sim is new).
  • Basic Hit:
    • The Bit-score has been added to the columns.
    • DE values of "-" will sort to the bottom of the table.
  • A reading frame was being assigned to an NT hit -- it now has a value of "-".
  • Tiny bug fix in alignments: In rare instance, aligning an NT-NT following by a NT-AA could create a bogus NT-AA alignment.
viewMultiTCW
  • Sequence Table: DE values of "-" will sort to the bottom of the table and the sort will ignore the minus sign before an DE value (as viewSingleTCW does).
Software details
  • The Best Bits, etc assignment was moved from DoUniProt.java to DoUniAssign.java
Go to top

v3.3.0 15-August-2021

Update existing multiTCW databases for this release: There is a small mTCWdb schema change, which will be updated the first time you access the database. However, the pairs table NTbest column will not have values unless you reload the AA/NT hit files.
  • Both DIAMOND and BLAST have been updated to their latest release (v2.0.11.149 and 2.12.0+, respectively). A slight change to the TCW was necessary for the latest DIAMOND and there is better error checking for the Find Hits feature.
  • The TCW DIAMOND defaults have changed to the following:
    • Sequences against annoDB: --max-hsps 1 --masking 0 --top 20 (changed again in v3.3.1 1-Sep-21)
    • Self-search: --max-hsps 1 --masking 0 --sensitive --query-cover 25 --subject-cover 25
  • All display of numbers use the Display Decimal settings except for the Overviews.
viewSingleTCW
  • The DE value of 3.0 meant it was not computed; now this will display as an "-" instead of 3.0.
  • The Basic GO columns are in a more logical order.
runMultiTCW
  • Database change:
    • mdb6.5: Add NTbest pairs columns. There has been a AAbest column, where a '2' indicates the pairs are the bi-directional best hits; only these are considered for the BBH-AA computation; having this column means that if the pair fails the BBH parameters, a second best pair cannot be used. This was not being done for the BBH-NT computation, and now is. The AAbest was also used in the AA Cluster algorithm, and now the NTbest is used for the NT Cluster algorithm.
    • If the user chooses to not update the database, it now will continue even though it will probably fail if any action performed on it except checking the overview.
  • Computing pair alignments has a memory reduction, and hence, speedup.
  • Assign Majority Hit to Cluster: The algorithm has a slight change where it will now only assign a 'Best Anno' from the sequences in the cluster (it was including Best Bit and Best GO). The documentation on this has been improved.
viewMultiTCW
  • For all filters on <=N and >M, if both N and M are 0, then the search is "=0".
  • The recently added KaKs=NA is now a checkbox to include or not include NA (it was yes/no/either).
  • When sorting a text column, blanks values would always sorted to the bottom; now they are sorted like any other value.
  • Added pair table 'NTbest' column.
Minor
  • runMultiTCW (1) Always record whether the BBH or Closure clustering is AA or NT. (2) Tiny bug fix: When removing clusters only, it was not resetting the flag that told viewMultiTCW that the pre-computed MSAs did not exist (anymore).
  • viewSingleTCW Overview includes the runDE filtering options under PROCESSING INFORMATION.
Go to top

v3.2.7 20-July-2021

This release has improved Decimal Display and multiple tiny improvements.

Decimal Display

  • This existed for viewSingleTCW and has been added to viewMultiTCW, though the latter does not have the color coding.
  • A third parameter has been added for the leading digits for E-notation, i.e. 1.2E-02 versus 1E-02.

runSingleTCW - all minor

  • Overview: slight changes to the AnnoDB and ORF section.
  • Compute ORFs:
    • The Default button resets to "Train with Best Hits"". If "Train with CDS file" is selected but no file entered, an error message will be written.
viewSingleTCW
  • Basic Sequence and AnnoDB Hits: The search rules have changed so that it is always searches on "contained" unless the user adds an "%".
  • Sequence Detail (Bug Fix): The "Best Bits" hit should be listed first, but in some situation it was not, so the wrong hit was shown on the "Frame".
  • Very minor stuff:
    • Filter: only show User Remark option if there are user remarks.
    • Basic Sequence: The "Orig ID" has been moved so it follows the "Seq ID".
    • GO Basic GO ID: Failed if the substring started with spaces.
    • Moved "Results" under "General".
runMultiTCW
  • mTCW db v64: The OrigID (original ID) has been added to the sequence records .
  • There is better error checking on loading User Defined file.
  • Stats: Percentages with a numerator of 0 were being saved as "-" when the numerator was 0; they are now saved as 0.
  • The Overview .html file has centered contents.
viewMultiTCW
  • KaKs will use NA instead of "--" to better distinguish it from "-", where NA is computed but gives an NA result when that is the KaKs_calculator result; the "-" indicates it was not computed because the alignment was too short.
  • View OrigID for sequence table and details.
  • List of cluster methods for filters and columns will be in order of creation, i.e. the same as shown on the overview.
Software Details
  • mTCW db v64: the OrigID was added to the sequence table and hasOrig to the info table.

v3.2.6 6-July-21

There is a small schema update (dbVer 5.9) for singleTCW; the first time a sTCWdb is accessed, it will be updated.

Dynamic Programming - used by both singleTCW and multiTCW

  • The Gap Open parameter has been changed from 4 to 7, which reduces gaps but increases mismatches.
runMultiTCW
  • Multi Align: Use built-in DP to create Multi-alignment if the cluster is size 2; this greatly speeds up this step.
  • New Remove...: Add Remove clusters from database.
  • Bug fix - Remove cluster: If a cluster set was removed after Multi Stats was run, then new clusters added, it did not work.
runDE
  • goSeq: runDE has the option to run this script on data written to R by runDE. Sometimes, its not clear why there is more or less enrichment p-values < 0.05; New a summary of statistics is written to help give some indication.
  • New Remove: The GO enrichment columns can be removed.
  • New The output to the terminal is appended to the file projects/RunDE/<dbName>.log.
  • DE for All Pairs: This option will not overwrite existing columns of the same name; the other three DE options ask the user if they want to over-write an existing column.
  • GO Enrichment: If executed on "All DE p-values", it will not overwrite an existing column. If executed on a selected DE p-value, it will overwrite an existing column.
  • The DE and GO p-value columns will be displayed in viewSingleTCW in the order that the DE p-value columns are created (only guaranteed if created with v3.2.6 or later).
  • The online DE User Manual was significantly updated.
runSingleTCW - Overview
  • AnnoDB: Stores annoDB stats so that it does not keep recomputing every time there is a change.
  • ORF Finder: Slight changes in Overview statistics, the anno.log stats removed, and the only Markov remark assigned to a sequence is if its not the best over all 6-frame ORFs.
  • GO: Save the number of sequences that have a hit with the GO directly assigned.
viewSingleTCW
  • Basic GO:
    • New Export option to output p-value columns as -log10(p-value).
    • New Column #Assign, i.e. number of sequences with hits that directly have the GO assigned. This is useless for any kind of analysis, but is good for providing intuition and understanding of the complex Hit-Seq-GO assigned-inherited relations.
  • Pair alignment: The bit score was added to the header.
  • Pair text alignment: Local and Affine Gap - changed from Blosum50 to Blosum62, which is what the semi-global (default) algorithm uses.
  • BUG FIX - Basic Sequence: If the database was built using the original sequence names, the Search did not work.
Go to Top

v3.2.5 4-June-21

This release make fixes and small changes to runMultiTCW.
  • Search Settings
    • Added Defaults button which set everything back to default.
    • Tiny bugs:
      • This got the wrong blast file in the following scenario: (1) Load an existing project. (2) Create a new project. (3) Build Database. (4) Search - used the initial existing project files.
      • If loaded existing project, then created new project, it used existing projects search parameters.
      • If search parameters where set to blank, they showed as defaults on the Settings panel on a new loading of the mTCW.cfg file.
  • Methods
    • OrthoMCL
      • This did not work with MariaDB v10.4.12.
        This has been fixed by editing the orthoLoadBlast script to set the MySQL local_infile from within the script.
      • TCW recently changed to using ".fa" suffixes for FASTA files, where OrthoMCL requires ".fasta" suffixes.
        This has been fixed by editing the orthoBlastParser to accept ".fa" suffixes.
    • Tiny bug: Non-default parameters were not used if the following order was followed: A method was added for a project with non-default parameters, exit runMultiTCW, restart runMultiTCW, then Add New Clusters.
  • Added a projcmp/ex directory with the results of running OrthoMCL within TCW on the three 'ex' demos; this can be added as an "User Defined" method.

v3.2.4 16-May-21

This release has improvements to viewSingleTCW for exploring GOs.
  • Basic GO, Basic Hit, Sequence Detail GO Panel
    • These three panels now have separate Show and Export buttons. There are various new Show.. and Export... options.
    • Many of the Export options allow the output to be All info or IDs only. The IDs only is very useful for then exploring the IDs in their respective Basic panel using the Load File.
    • Terminology, file names and displays have been made systematic.
    • The Basic GO option Hits with inherited GO listed had duplicate hits.
  • Display Decimal
    • All parameters are saved between sessions.
    • A Set Defaults button returns all parameters to their defaults.
Go to top

v3.2.3 9-May-21

The approach for GO evidence codes has been changed.
  • For existing sTCWdbs, there is a small schema update and its necessary to redo the runSingleTCW GO only options.
     
  • viewSingleTCW Basic GO
    • The columns and filters for evidence code now use the six GO-defined evidence categories instead of the 27 individual evidence codes.
    • The evidence category columns can be shown in a long or short format, where the long shows all evidence codes that were found in the category. The short format just shows a 'Yes' that it has EvCs in the category.

v3.2.2 4-May-21

  • Terminology changes:
    • GO: Domain => Ontology, the corresponding column heading is GO
    • GO: Ontology abbreviations bio=>BP, cel=>CC, mol=>MF
    • GO: DE => Enrich
    • Evidence code: EC => EvC
  • runDE
    • The GOseq results are multiple hypothesis testing corrected (see goSeqBH.R).
    • The results are in 'oResults' instead of 'results'.
  • viewSingleTCW
    • Decimal Display
      • There is a new option to highlight p-values that are less than a given amount. See this panel and its Help for more information
  • Bugs
    • On the Main table, if the Row # column is moved in the table, the Copy Table and Export table of columns did not work right.
    • On Basic Hit table, if there were no GOs, the DE P-values could not be viewed in the Seq-Hit mode.
    • Recent tiny bug on Basic GO Annotation: For the DE #Seqs, it was suppose to be reading the original DE cutoff from the database, which was not working.
  • Software Details
    • All terminology changes are in Globalx and the GO abbreviation mapping is Static.goTermMap(). BasicGOTablePanel and MainTable were querying for p-value columns, which now gets it from Metadata.
    • All Exports files have been rechecked - no problems found.
Go to top

v3.2.1 19-Apr-21

Update existing sTCW databases for this release: There is a small sTCWdb schema change, which will be updated the first time you access the database.
  • runDE
    • The built-in GOseq has been changed to an R-script. This allows the user to make changes to the script or provide their own method.
    • NOTE: The goSeq results are exactly the same as before, which are not multiple hypothesis tested. This has been made clear in the documentation, and now the user can alter the goSeq.R script to add the test.
    • Some changes to the interface to make it more obvious.
  • Software Details
    • The GO metadata has been moved from assem_msg.goDE column to two new columns in table libraryDE. This required changes in Schema, MetaData, Overview, runDE.QRprocess, annotator.DoGO

v3.2.0 29-Mar-21

Existing GOdb and sTCWdb: If you use GO evidence codes or EC (enzyme code), you will want to recreate the GOdb (i.e. runAS) and re-run GO Only from runSingleTCW.

  • runAS
    • Parsing go-basic.obo:
      • As a sanity check, all UniProt GO are checked for existence in the go-basic.obo file.
      • Was not saving the last GO.
    • Parsing UniProt:
      • Only the last EC (enzyme code) was being saved; now all ECs under "RecName" are saved.
        Also, the text after the EC code is removed, e.g. "2.4.1.- {ECO:0000256|RuleBase:RU362057}" is "2.4.1.-".
      • The GO evidence code "IC" was being stored as "UNK".
      • A count of the number of obsolete GOs in an UniProt file is printed to the terminal.
        The obsolete GO is still in the GOdb and the obsolete GO in sTCWdb will have a prefix of "obsolete" and no neighborhood.
  • runSingleTCW
    • Evidence Codes: Bug - The evidence codes were wrong (I don't know what release broke this).
      This has been fixed and only the evidence codes from the UniProt hits with the GO assigned will be shown.
  • viewSingleTCW - Basic GO
    • The interface has been updated to indicate that only assigned Evidence codes are used.
    • Slight changes to Show... to make the 'alt_id' (replacements) more obvious.
    • Changed some terminology to be compatible with AmiGO, e.g. "GO term" changed "GO ID".
  • Updated the runAS documentation to describe parsing the OBO file.


TCW Version 3.1

v3.1.9 25-Mar-21

  • Overview
    • Rearranged Overview.
    • Added some GO statistics
  • viewSingleTCW
    • Basic GO:
      • Show... Neighborhood list with relations has the added relations of replaced_by and replaces.
      • Added item under Table... called Each GO's parents with relation which produces a popup or file with output like:
        --------->      GO:0000019      bio     regulation of mitotic recombination
        is_a            GO:0000018      bio     regulation of DNA recombination
        --------->      GO:0000027      bio     ribosomal large subunit assembly
        is_a            GO:0022618      bio     ribonucleoprotein complex assembly
        part_of         GO:0042255      bio     ribosome assembly
        part_of         GO:0042273      bio     ribosomal large subunit biogenesis
        
      • The GO Help has been updated.
    • Sequence Detail: A few small problems with GO, e.g. the number of "Unique GOs" was always 0.
  • On-line TCW Tour:
Go to top

v3.1.8 19-Mar-21

  • runAS - building the GOdb mysql database of GOs and UniProts.
    • The GOs were obtained from the downloaded go_<date>-termdb-tables.tar.gz, which has been discontinued. Now runAS downloads the go-basic.obo file and builds the GOdb from scratch.
    • The GO Trim function is disabled (probably permanently).
    • Previous GOdbs - should still work except that for the GO_slims, which will be ignored.
  • Documentation
    • The Web documentation all use the same styles.
    • The Java Help pages for singleTCW all use the same style (the multiTCW Help pages will be updated later).
  • viewSingleTCW
    • Basic GO - Show... - The Neighbor List
      • The list is sorted by description to match GO Amigo display.
      • The "replaced_by" relation is now shown in this list.
  • Software Details
    • The DoGO.java has been replaced with DoOBO.java. Besides this file, there was only some slight changes for the GO_slims in everything else.
    • On linux only: non-UniProt annoDBs only; Sequence Details: GO... (1) The Hits showed an error even though it does work. (2) Selecting a hit with no GOs showed an error, though it also worked.
    • The Web documentation use HTML5 commands that passes BBedit tests, though it does not process "<!--#include virtual=" statements. It also passes HTML validator, except that it complains about formatted words in columns or bullet lists; e.g. BBedit is formatted within a bullet list). This is used extensively and seems to be allowed by all major web browsers.
    • The Java HTML renderer seems to only partially respect <style> commands, so it is a mix of HTML5 and HTML4. It has been tested on various MAC Java installation and Linux Java 8.1.

v3.1.7 21-Feb-21

Update existing sTCW databases for this release: it is a good idea to reload the hits to make the database consistent with the interface (e.g. Best Bitscore has replace Best Eval).
  • runSingleTCW
    • Best and Rank:
      Best Bits The Best Eval (best e-value) has been replaced with Best Bits. This is much better than using the e-value, which depends on the database size.
      Best Anno (1) It used to be the case that the e-value of the candidate best anno hit had to be reasonable close to the Best Eval e-value. It no longer checks the e-value or bitscore.
      (2) The rules for determining an un-informative hit have been slightly changed, e.g. "unnamed protein product" has been added to the un-informative list, which is found in nr.gz.
      Best with GO Is now the highest bitscore hit with GOs.
      RankIs assigned based on the hit list for an given sequence and annoDB being sorted on the bitscore, then further sorted on the e-value when the bitscore is tied. It was the case that TCW used the input order, which where the e-value takes precedence over the bitscore.
    • hitWarnings.log: Only problems with hits that are found in the tab file are recorded to this file (use to be for all entries). The demoTra example TwoOsSeqs.fa sequence file has hits to NR entries that are too long and have been truncated, hence, written to the hitWarnings.log file.
  • viewSingleTCW
    • Sequence Filter: (1) Filter has been added for bitscore. (2) Filter has been added on Best Eval != Best Anno and GO!=Bits&GO!=Anno.
    • Sequence Detail: (1) In the hits table, the bitscore has been moved before the e-value to make it obvious that the sort is on the bitscore before the e-value. (2) A Copy.. Selected Hit Description has been added.
  • Demo files
    • UniProt_demo has been updated to Jan-2021 along with the associated GO file.
    • Other_demo has been added, specifically for demoTra. It has examples of NCBI protein nr.gz, PlantTFDB-all_TF_pep.fas, and NCBI RNA Sorghum bicolor sequences.
    • demoTra - A README explains the various input files, as this project has examples of most input files.
  • /scripts
    • A new python script called formatNCBIrna.py that formats the header lines of a NCBI RNA file for use with TCW.

v3.1.6 11-Feb-21

  • File Chooser
    • Only shows files with the expected file extension. However, the File Format: drop-down can be changed to "All Files" in order to select any file.
    • Always goes to the default directory. If the user changes to a different directory, the File Chooser will use that directory the next time it is used for the session.
    • When any file is read, its basic file structure is verified.
  • Little fixes
    • runMulti: (1) Removed obsolete parameters. (2) The 'blastn' defaults had an extra incorrect parameter (introduced in v3.1.6). (3) The Search "Setting" button now stays active even after search is done (so can see parameters).
    • viewMulti: Seq Detail: if the GOs were displayed, and then Next was used, the same GOs for the previous page were displayed.
  • Software
    • All methods (both for single and multi TCW) now use the File Chooser in the "util.file" class, with the exception of the runAS and sTCW "Count file generator".

v3.1.5 28-Jan-21

  • Compressed files allowed (i.e. suffix .gz)
    • runAS - reading .fasta files and .dat files
    • runSingleTCW - reading .fasta files, .qual files and expression files
    • All File Choosers popups that restrict selected files to FASTA file now allows the ".gz" suffix on it. The accepted suffixes are "fa","fasta","fna", "ffn", "faa", "frn","fa.gz","fasta.gz","fna.gz", "ffn.gz", "faa.gz", "frn.gz".
  • runAS
    1. The AnnoDB.cfg button (previous TCW.anno) now creates a file with the suffix ".cfg".
    2. File sizes are included on the trace output.
  • runSingleTCW:
    1. The File Chooser for "Import AnnoDBs" restricts files to those with the ".cfg" suffix, which works for the AnnoDB_UniProt_<date>.cfg and sTCW.cfg files.

v3.1.4 20-Jan-21

This release deals mainly with the singleTCW "Similar Pairs" option and updating with the latest DIAMOND.

Changes that will effect existing sTCWdbs:

  1. Schema changes for sTCWdb, which will be updated the first time the sTCWdb is used.
  2. The "Similarity Pairs" .tab file names have changes; old ones will not be recognized.
Additions and modifications:
  • For both singleTCW and multiTCW
    • DIAMOND
      • DIAMOND has been updated from v0.9.22.123 to h4.0.6.144.
      • I had optimized parameter on the first DIAMOND release, which are now counter-productive. The DIAMOND defaults are now used except for "--max-hsps 1" since TCW only uses the first HSP.
    • View
      • The Export methods all behave the same now.
      • The ruler for the pairwise alignment "Line" view has more precision for N:1 displays.
      • Find Hits: The File Chooser specifies file type extensions ".fa" and ".fasta".

  • singleTCW database
    • There is a schema change for the sTCWdb (release v5.4). All changes are for "Similar Pair" processing.

  • runSingleTCW
    • Similar Pairs
      • NEW option to compare translated ORFs for NT-sTCW and AA sequences for AA-sTCW, where either DIAMOND or BLAST can be used.
      • The interface no longer allows specifying an existing blast tab file, but there are instructions on how to provide one.
      • The sTCW.cfg keywords have been changed along with the file names.
    • Reduced ORF finding and Overview computation output to terminal. The output for "Annotation" is clearer.
    • Changing search parameters did not always work as expected; this has been fixed.
    • AnnoDB Panel has a new "Reset to Default".
    • The File Chooser specifies file type extensions ".tab", ".fa" and ".fasta".

  • viewSingleTCW
    • Align
      • NEW The "Trim" function has been added to show only the aligned regions.
      • On the "Line" view, the arrow at the end of the line reflects the orientation.
      • For NT alignments, "R" is after the sequence name if it is reverse complemented.
    • Similar Pairs
      • NEW There is a new column of "Hit Type" indicating whether there was a NT (blastn), AA (tblastX) or ORF (translated ORFs). There is a "Pair Filter" to search on this column. This is only relevant for NT-sTCW.
      • There are new "Copy..." options for the sequence or reverse complement.
    • SeqDetail Frame: The ORFs are listed in order by frame (3,2,1,-1,-2,-3).
    • Tiny bug
      • For Assembled sequences: if a gap was added to the consensus, it would in certain circumstances result in the incorrect frame for a protein hit.
      • Export would sometimes hang, which has been fixed.

  • Software
    • All Exports methods are in sng.util.ExportFile
    • The AlignBasePanel was split into PairBasePanel and ContigBasePanel

v3.1.3 30-Dec-20

This release continues to improve on the multiTCW MSA scores and viewing alignments.
  • runMultiTCW Align
    • The score1 and score2 column values are stored in the database to be displayed for a MSA.
    • Min-Max normalization is applied to the Sum-of-pairs scores so that they are within 0-1.

  • viewMultiTCW
    • Align
      • For blosum=0, a different color is used (it was the same color as <0).
      • All align panels have a "Trim" option, which removes hanging sequence.
      • MSA
        • Header: The description and global score is shown above the alignment
        • In the Sequence view, clicking on an AA character will show the scores and composition of column
        • If other then the default score methods were used for building the mTCW database, they will be used for running the external MSA program (i.e. MSA...). The scores are written to file in /ResultsAlign.
        • Tiny bug: stop was not removed on perfect aligns
    • Other
      • Hit Table: Add "Copy hit Sequence"

  • runSingleTCW
    • If the sTCWdb was annotated with NT and AA hits, the Best Annotated Hit will be an AA hit (these are used for ORF finding). Note, the NT can still be the Best Eval.

  • viewSingleTCW Align
    • Align
      • For hit and pair alignments, there is now a N:1 increase zoom along with the original 1:N decrease zoom
      • The "View" line has been modified so that all options show the current state (like viewMulti does).
      • Tiny interface cleanups:
        • For AA-sTCWdb (i.e. the database was created with protein sequences), the alignment process produced incorrect warnings (the alignment was fine). The Find Hits did not work for the AA-Seqs.
        • For an NT hit aligned to an NT sequence, the "Align..." pop-up produced incorrect warnings.
        • For the "Seq" view: (1) the Ruler extended the length of the NT sequences for an AA display. (2) Only 15 bases of the overhang was suppose to be shown, but that did not always work.
        • For selecting a sequence for the "Align...", the bottom sequence had to be selected; now either can be.
    • Other
      • Basic AnnoDB Hit: Copy Hit Sequence: the name is copied with the sequence in FASTA format.

  • For both viewSingleTCW and viewMultiTCW
    • The information popup windows have selectable text.
    • Find Hits - AA-ORFs (sTCW) and AA-Seq (mTCW) are the default Subject with Diamond as the default search program. For "Delete search files", not all files were being deleted. A few confusing aspects of the interface were fixed, mainly dealing with parameters.

  • Software
    • Move single align classes to sng.viewer.panels.align (only AlignPairOrig is shared)
    • Split MainToolPanel into PairViewPanel and ContigViewPanel.
    • Renamed sng classes to correspond to cmp classes of similar type.
    • Removed any references to 'sng' in /util or /cmp
    • Cleaned up a lot of dead code from graphics routines
Go to tope

v3.1.2 4-Dec-20

This release concentrates on improving the multiTCW MSA scores and viewing alignments.
  • runMultiTCW
    • Schema (mTCW db6.3): small change to add storage for the two score methods.
    • The MSA score2 was the external MstatX Trident method. Score2 has been changed to a built-in Wentrophy, which is exactly the same as the MstatXd Wentropy except that the score is (1 - score) so that a value of '1' is the most conserved.
    • The user can request that Score1 and/or Score2 be an MstatX method, where this is a command line option (see ./runMultiTCW -h). Scores can be updated just by, setting a command line flag, and then running the "Run Stats" with the "Compute MSA and score" option.
    • Sum-of-Pairs can get large negative numbers if the cluster is large and has many gaps, hence, TCW will not compute this score if this is the case (this is a temporary fix, as it is rare, but can hang the machine).
    • The Hit Cluster method has a new parameter, that when set, requires all sequences in a cluster to have a hit with all other sequences.
  • viewMultiTCW
    • Since the MSA scores can be different from the defaults, they are now referred to as Score1 and Score2:
      • At the bottom of the Overview, it is stated what methods were used.
      • In the Cluster table, mousing over the column Score1 and Score2 in the column panel shows the method in the lower left hand corner.
    • The alignment views have been improved:
      • The 3-views for a pair will scroll now and allow the viewing of the sequence.
      • The default view is to show non-synonymous amino acids in one color and the synonymous ones in another. It still does that, but the colors have been altered to make them easier to detect.
      • An option has been added to view the "Zappo" physicochemical coloring.
      • For the "View Seq" option:
        • The characters that are different from the consensus are in bigger bold face to make them more obvious.
        • The "Dot" options allows for all matches to be shown as a '.', which makes it easier to view the differences.
      • For the "View Line" graphical option:
        • The Zoom now has both 1:N and N:1 options, where 1:N decreases the graphic size, and N:1 increases the size. The N:1 provides more space between the vertical hashes (differences), which makes them easier to distinguish.
  • Updates to the Help and on-line documentation.
For developers: The alignment classes are now in a new seq.align package. The SumStats method is renamed to PairSumStats. A ./runMultiTCW -x will remove clusters before re-scoring clusters.

v3.1.1 10-Nov-20

It is often easier to use the TCW generated sequence names (e.g. Os_00001) than the sequence names supplied in the file (e.g. NM_001048268.2), but it is important to be able to access the original names, hence, the following changes.
  • runSingleTCW
    • Bug fix: When Use Sequence Names from File was selected, it forced Skip Assembly to be unselected, whereas it was suppose to force it to be selected.
    • Annotate: It will no longer create the HitWarnings.log file unless it is written to. This file has warning such as the "Description" being longer than allowed.
  • viewSingleTCW
    NameUse Sequence Names from FileSkip Assembly
    Orig IDNoYes
    LongestNoNo
    Orig ID or noneYesYes

    • Basic Query Sequence: Allow query on Orig ID or Longest unless the Seq ID != Orig ID. Make the column heading Orig ID or Longest according to the table above.
    • Sequence Detail: Show Orig ID or Longest in the top text area according to the table above, and add it to the Copy... drop-down.
    • Sequence Table: Make the column heading Orig ID or Longest according to the table above.
  • Update all of Tour.
Go to top

v3.1.0 5-Nov-20

  • mTCW schema update to version 62 for the new Hit Table.
    • Current mTCW databases will be updated the first time they are viewed.
  • viewMultiTCW:
    • Made all tabs on the left and their tables summaries consistent for Sequences, Pairs and Clusters. Also made their column naming and layout more consistent.
    • Overview:
      • The "Explain" button, which explains what the numbers in the Overview mean, has been improved (albeit, still a bit confusing). There are now two AAsim percents, one is the percent of equal AA chars from all aligned AA chars, and second is the average of the %similarities for each pair alignment. The same two numbers exist for NTsim, AAcov, and NTcov.
    • New Hit Table and Filter:
      • This is a new table and filter. It only links to the Sequence Table.
    • Sequence Details:
      • Add "Copy...", which copies to the clipboard information for the selected seqID in Pairs table or hitID in Hit Table.
    • Sequence Table:
      • Add to the "Copy..." option "Hit Sequence"
      • Hit Align: (1) Allow only one sequence to be selected, (2) If Best Hit is NT, translate to AA before aligning to AA sequence.
    • Pairs Table:
      • Add "Cluster" button to show all clusters from the selected set of pairs
    • Cluster Table:
      • Add "Next/Prev" when viewing Pairs or Sequence selection.
    • Pair and Sequence Filters:
      • If a minimum of "0" was entered, "All" was returned; now it correctly remains "0".
    • Updated the online Help.
  • runMultiTCW:
    • Add 'Last Update' to the Overview.
    • Reduced the memory a little on 'Pair Stats'.
  • runSingleTCW:
    • ORF finder: If all Best Hits are nucleotide, the Best Hit and Markov options do not work; this information has been added to the Help.
  • Updated the Tour for runMulti.

2020 releases

v3.1.204-Dec-20runMultiImprove MSA scores
v3.1.005-Nov-20viewMultiAdd Hit Filter and Table
v3.0.404-Sep-20runSingleTPM for normalization instead of RPKM
v3.0.316-Jun-20PackageMoved external and external_osx to Ext/linux and Ext/mac

Go to top

v3.0.5 7-Oct-20

  1. runMultiTCW:
    • New BestHit: Create Clusters based on Best Annotation Hit shared by HitID or Description.
    • Small changes: (1) Slight change in assigning the best hit per cluster. (2) Slight change in Closure algorithm to round %Similarity and %Coverage (BBH was already doing that).
    • Updated multiTCW Help pages and on-line documentation.
  2. viewMultiTCW:
    • New Find Hits: This works like the viewSingleTCW "Find Hits:", i.e. search against sequences in database or a protein database.
    • Sequence table: New alignment types for "Pairwise...":
      • AA to sequence best hit: The best hit for each sequence will be aligned to it.
      • AA to cluster best hit: (Clusters only) The assigned cluster best hit is aligned to each sequence.
    • Cluster Table: New Table... options to export counts or TPM values for each sequence in each cluster.
    • Sequence Filter: New HitID and Description search.
    • Sequence Table: The "Copy..." to clipboard of database sequences includes the sequence name.
    • Sequence Detail: In the Hit Table, changed "Start" and "End" to "aaCov" and "hitCov", where aaCov=(alignLen/aaLen)% and hitCov=(alignLen/hitLen)%. Added a Help page.
    • Moved the Sample tabs on the left to the bottom.
    • Export: (1) Sort GOs by domain, level, GOnum. (2) Bug fix: Export Table was replacing '3' with '-'.
  3. runSingleTCW:
    • The Annotation "Option" panel:
      • New parameter that when set, will remove the {ECO...} part of the description. For example, description Coatomer subunit zeta-2 {ECO:0000313|EMBL:OAY72132.1} would be changed to Coatomer subunit zeta-2.
      • The "Similarity Search" sub-panel is more intuitive (there are check-boxes to indicate preforming search).
    • Small bug: "Remove Annotation" did not remove the computed ORFs, so were shown in viewSingleTCW.
  4. viewSingleTCW:
    • When there are no count files loaded, the queries for them have been removed (they were meaningless).
    • Find Hit: This has been moved to under the "General" heading.
      Also, "View Selected Sequence" for Find Hit was using the Query, and has been changed to the Subject seqName.

v3.0.4 4-Sept-20

This release is mainly some clean-up, along with the computation of TPM replacing RPKM.
  1. runSingleTCW:
    • Improve error recording on Build Database and Instantiate
    • Compute TPM for normalization instead of RPKM. However, RPKM can be computed by executing execAssm <project> -r.
  2. execLoadLib.pl is changed to the script execLoadLib; execAssm.pl is changed to the script execAssm. The scripts accept the parameter "-n" to execute without any prompts; this is good for batch processing.
  3. viewSingleTCW:
    • Bug fix: The "Seq #" column would not display if GOs were in the database.
    • Clicking a heading to sort a table did not work well on MacOS, which has been fixed.
    • For viewing the multi-line text alignment, a sequence does not need to be selected.
    • The longest sequence name is a column loaded transcripts, as it provides the original name of the sequence. Also, the Basic Sequence panel allows the longest sequence name to be copied to clipboard.
  4. Internal clean-up to remove Java warnings
  5. The /doc has been updated.

v3.0.3 16-June-20

  1. Update for MySQL v8:
    • The code has been updated to work with MySQL v8.
    • The MySQL schema has a slight change, which TCW applies the first time a sTCW database.
    • It was added to open the database with characterEncoding=utf8
    • ./runSingleTCW -v to check important MySQL variables
  2. External programs:
    • The directory external has been renamed Ext/linux and external_osx renamed Ext/mac.
    • Blast has been added to the package in Ext/linux and Ext/mac.
    • TCW was tested with MacOS 10.15 (Catalina), where some of the external programs that worked on MacOS 10.9 no longer worked; these have been updated.
    • TCW does a better job of checking and providing errors, and Trouble Shooting has better instructions on how to fix problems.
  3. TCW can no longer be run as an Applet.
  4. SingleTCW has a little more information on the first line of the overview.
  5. The documentation has been updated.

v3.0.2 5-May-20

runDE: The R-script/edgeRglm.R has been changed from importing the p-values to importing the FDR values.
Github: The /doc directory of html instructions has been added to Github.


v3.0.1 30-Oct-19

  • SingleTCW:
    1. MariaDB 10.4.7 broke the assembler, which has been fixed.
    2. The overview is available after 'Build Database' in order to show the libraries loaded.
  • MultiTCW:
    1. For runMultiTCW, selecting a different database from the dropdown is a little faster.
    2. In viewMultiTCW, the KaKs columns and queries are only available if the data has been loaded.

TCW Version 3.0

v3.0 10-Aug-19

  1. With this release, the code is on github.com/csoderlund/tcw.
  2. There was a major internal code cleanup.



Lost release notes after 1Apr19

The release notes got lost in a corrupted backup. From 1Apr19 to 10Aug19 are lost.

TCW Version 2.13

Release 1Apr19
  1. If you are not using TCW v2.12, see the green highlighted points for the v2.12 release.
  2. The multiTCW algorithm has been improvement for assigning shared descriptions and distinguishing between KaKs 'not run' versus 'null value'; to get new assignments, remove Pairs and Clusters and re-add. (However, viewMulti works fine without the update).
General:
  1. Added the search program Diamond to the /external and /external_osx directories.
  2. The Show Stat function bug: If values were missing, the Median was wrong.

runSingleTCW

  1. Assigning bestAnno: If the bestAnno is not the same as the bestEval, and the bestEval is not a good description but has a much better E-value (exponent is 80% higher), then the bestAnno is set to the bestEval. This increases the number of "uncharacterized" hits, but if the bestEval has an E-value of 0.0 and the bestAnno has an E-value of >1E-30, it makes more sense to have the bestAnno be "uncharacterized".
  2. Minor bug: If an annoDB had a type other than 'sp', 'tr', 'pr', 'nt', e.g. PlantTFDB uses the format of >tf|B3_1 B3 family protein {KFK32619.1} OS=Arabis alpina where 'tf' is the type, then the viewSingleTCW General columns of #Protein or #Nucleotide would be incorrect.

viewSingleTCW

  1. Find hit: when using Diamond for searching, it reformatted the database everytime; this has been corrected.

runMultiTCW

  1. Assigning a representative hit:
    • The algorithm for assigning a representative hit to a cluster or hit pair has been re-written to give more weight to the hits with GOs.
    • Hit pairs are assigned a representative hit if a shared hit can be found, else, "NoShare" is assigned.
    • For clusters, if any sequence has a hit, the cluster will have a hit even if only one sequence has the hit.
  2. If pairs have been aligned and KaKs values added, then new clusters added, all pairs were realigned in order to write to file. Now only the pairs that have not been aligned will be re-aligned and written to file.
  3. Improved error handling for when the permissions are wrong in the /external or /external_osx directories.
  4. Bug: the Multi align function could result in too many open files.
  5. Bug: The CDS/5'UTR/3'UTR lengths and CpG were not getting written to the database (recent bug).

viewMultiTCW

  1. The Cluster and Pairs filters have a Yes/No toggle to find clusters with or without a description substring.
  2. The Export GOs has a count cutoff, and an option for outputting the description and count.
  3. The Pairs table will have a hitName of "NoShare" if there is no shared description.
  4. Sequence details has new 'best' column on hit table.

TCW Version 2.12

Release 25Feb19: This has improvements to multiTCW for GO support along with making the sTCW and mTCW GO support consistent (though sTCW still has more support than mTCW).
  1. If you are not using TCW v2.11, see the green highlighted points for the v2.11 release.
  2. There is a small database change for multiTCW, which will be applied the first time your view a mTCWdb. If the GOs were previously added, they will be removed and will need to be re-added, which will use the new algorithm.

runMultiTCW

  1. The "Add GOs" option was rewritten so that only GOs associated with the hits in the database were added. That is, only the top 5 hits for each sequence are in the mTCWdb, so only the associated GOs are added.
  2. Clusters have a minPCC and %PCC column, which were being computed for clusters that were added before computing pairwise PCC, this has been fixed.

viewMultiTCW

  1. Sequence Detail: add options to view the GOs for all or selected hits.
  2. Cluster table: the GO Export is more user-friendly.
  3. The pairwise AA and NT alignments were cropping the overhangs before display; now the entire alignments are shown.

viewSingleTCW

  1. Sequence Detail: Add option to view "All GOs for selected hit".
  2. The Export functions work in a more systematic way across tables.
  3. Bug fix on Basic GO annotation: if the p-values had not added using runDE, the "#Seq" option did not work.

TCW Version 2.11

Release 9Feb19: This includes some speedups and a new option in viewSingleTCW for GOs.
  1. If you are not using TCW v2.10, see the green highlighted points for the v2.10 release.
  2. The new GO feature allows you to view the number of DE sequences associated with a GO DE value.
    To use this feature, you need to rerun GOseq in runDE.

runDE

  1. Speedup to adding the results to the database, which makes a difference when using MariaDB.
  2. The cutoff used for GOseq is saved in the database and displayed on Overview. The variables names written to R have been changed to reflect their meaning.
  3. The built-in edgeR and DEseq2 have been removed as they are redundant with the r-scripts.
  4. The "Top N" only worked correctly for the first set of conditions (this option is seldom used since the p-value cutoff is better except for experimentation).

runMultiTCW

  1. The algorithm for assigning shared annotation to a cluster had changed, where it is not the best annotation hit.
  2. Speedup on "Build Database" and "Add GOs", which makes a "huge" difference when using MariaDB.
  3. The Closure algorithm now guarantees that best hit pairs are in a cluster together.
  4. Whether the NT blastn is checked in the 'Compare sequence' Settings is saved in mTCW.cfg.

viewSingleTCW: changes for Basic GO panel.

  1. A new option for the DE set of filters, which allows the #Seqs column to reflect the number of DE sequences for the GO term, or the number of up-regulated or the number of down-regulated. The "View Sequences" only shows the sequences associated with the #Seqs number. The TCW Basic GO Help provides much more information about this new feature.
  2. The "Table..." menu has a new option called "Export/merge #Seqs for table GOs", which allows subsequent columns of #Seq to be added to a .xls file, e.g. the number of sequences that are DE for one comparison compared to another. Excel can then be used to view the associated graph.

viewMultiTCW

  1. Cluster table, the Export all cluster GO option, the Per Sequence vs Overall option had been removed. The output can the viewed columns or just the cluster names and counts (useful for input to REVIGO).
  2. Sort is changed to be case-insensitive.
  3. In the sequence table, multiple sequences can be selected in order to view all their clusters.
  4. Fixed a couple of obscure bugs in "Show Column Stats".

General changes to views:

  1. The drop-down menus have been changed to use a type that includes arrows at the end to scroll.
  2. The terminology of "Substitution" for Blosum scores has been standardized.
  3. The term "Overlap" has been changed to "Coverage", which indicates how much an aligment covers a sequence.
  4. The mTCW.cfg now uses the terminology MTCW, STCW, CLST to replace the old terminology of CPAVE, PAVE, POG.
  5. The precision on the "Column Stats" has increased.
  6. Other little terminology clean-ups.

TCW Version 2.10

Release 21Dec18: The biggest addition is the scoring of multiple alignments in runMultiTCW, and many small upgrades and two bug fixes.
  1. The "/libraries" project directories have been merged with the "/projects" project directories.
    The first time you run runSingleTCW, it will ask you if you want it to merge the directories. No files or directories will be deleted, only moved. Anything that can't be move will remain and "libraries" will be renamed "libariesOld".
  2. When parsing the UniProt .dat file, TCW treated OS lines that had no additional information as a species name with a '.' at the end, and if it did have additional information, there was no '.'; hence, there could be species such as 'Asperguillus niger' and "Asperguillus niger.'. The '.' is now removed from the end so its just one species.
    To apply this change in an existing sTCW, remove the annotation and reload it.
  3. The five 'high throughput' GO evidence codes have been added.
    If you have an existing sTCWdb, the schema is updated on the first view (update to sTCW db5.3)
    To get the HT GO evidence codes, you need execute runAS and select "Build GO" to rebuild the TCW-GOdb; then to update a sTCW database select "GO only" from runSingleTCW (or ./execAnno <project> -g).
  4. The sTCW overview has many changes along with a "Reproduce" popup that explains how to reproduce the numbers in the overview. The overview should be updated to correspond with the "Reproduce".
    viewSingleTCW <project-name> -o
  5. The biggest addition is the scoring of multiple alignments for multiTCW.
    If you view an existing mTCWdb, the schema is updated on the first view (update to mTCW db5.9).Run the "Run Stats" command with "Compute cluster scores" selected in Settings.

runAS

  1. The five high throughput GO evidence codes have been added.

runSingleTCW

  1. The log files are all written to the sub-directory "logs", where subsequent runs concatentate to files. Load Data writes to "load.log", Instantiate writes to "inst.log", annotate writes to "anno.log".
  2. The html directory has been renamed to OverviewHTML (all overviews are written to this directory). The project ORF directory has been renamed to "orfFiles".
  3. All functions write elapse time and memory (memory is very approximate). For time, the nanoTime function is used.
  4. An option has been added to the allow the user to decide whether the SwissProt hit should take precedence for Best Anno.
  5. The Best-Anno hit is calculated as follows: (1) It is the first hit in the sorted list that has a good description. (2) If the above SwissProt option is selected, then if there is a SwissProt hit with a good annotation that has an exponent within 20% of the best anno hit, it is used.
  6. The "un-annotated only" option has been removed from the panel that added an annoDB (the speed of Diamond makes this unnecessary).
  7. The ORF finder has multiple small changes. The hit cutoff defaults have been changed to better apply to Diamond hit results. The codon frequencies are no longer written to BestORFScores.txt. The algorithm has been slightly changed.
  8. The overview has quite a few changes.
  9. The "Edit" for the AnnoDBs did not work. Now it is possible to edit the "taxonomy" only.

viewSingleTCW

  1. BUG FIX: Basic Hit - the GO, Interpro, etc columns were off by one.
  2. Show stats - bug fix, same as in viewMultiTCW.
  3. The string sort was changed to case-insensitive.
  4. The Sequence Detail Frame panel had the Markov and Codon scores swapped, this has been fixed.
  5. Basic GO - the five high throughput GO evidence codes have been added.
  6. ./viewSingleTCW project-name -w writes to the terminal the overview up to the "Processing Information", then exits.

runMultiTCW

  1. The Stats options panel has an option to score the multi-alignments of all clusters in the database. It runs the MAFFTA on each cluster and computes the Sum-of-pairs score and the Trident score, where the Trident score is computed by the MstatX program (see github.com/gcollet/MstatX). The two scores are added to the database. The MAFFTA and MstatX programs are contained within the TCW distributable.
  2. The conLen (consensus length) and sdLen (stddev of the sequence lengths in the cluster) are computed and added to the database. The multiple alignment is also saved.
  3. The overview has replaced the 'Taxa' section with 'Average and Stddev', which is the average and standard deviation of the four new columns.
  4. The search program parameters are saved, and they are listed at the end of the Overview.
  5. Bug fix: The Cluster PCC and minPCC columns were not getting populated.

viewMultiTCW

  1. Clusters Table has four new columns: conLen (consensus length), sdLen (StdDev of the AA sequences for the cluster), Score1 (Sum-of-pairs), Score2 (Trident). The Help explains more about these.
  2. When MUSCLE or MAFFT are run on a set of sequences, the input file, output file, and score files are written to the ResultAlign sub-directory; these only are saved for the last alignment. File score1.txt has the column sums for Sum-of-pairs and score2.txt has the column sums for Trident.
  3. MAFFT can now be run on the NT sequence or the AA sequence. Occassionally MAFFT will fail; TCW now catches its failure.
  4. A "multiDB" option displays the alignment saved in the database.
  5. Show Stats - the following two problems have been fixed: It did not work for floating point numbers, and it did not work if there was less than 6 rows in the table. A "sum" column has been added.

viewSingleTCW and viewMultiTCW

  1. TCW creates three subdirectories: ResultHit, ResultExport, ResultAlign. All searches occur in the subdirectory ResultHit. The default export directory is ResultExport. ResultAlign is used by viewMultiTCW as described above in item 2.

TCW Version 2.9

Release 21Oct18: Requires Java 1.7 (instead of 1.6). Added User Remark to sTCW and MAFFTA alignments to mTCW.
The first time you view an existing sTCW database, the database will be updated from schema db5.1 to db5.2.
The MAFFT code is in the jar file in the directories /external for linux and /external_osx for mac.

Release 14Oct18: Change to the viewSingleTCW overview of the annoDBs, slight change to the Diamond TCW defaults, and added filters.
The first time you view an existing sTCW database, the database will be updated from schema db5.0 to db5.1.
To update the overview, execute "viewSingleTCW <id> -o"

Release 4Oct18: Speedups for runAS and runSingleTCW building databases, especially if using MariaDB 5.5 for MySQL.

Release 28Sept18: This version is mainly about using Diamond and Blast within singleTCW.
If you have existing sTCW and mTCW databases, see hitResults below.

runSingleTCW

  1. Search annoDBs:
    • Parameters were established that result in Diamond getting close to the same hits as Blast.
      It uses "--top 20" instead of "-k 25", hence, there can be many hits when using TrEMBL. Therefore, there is now an internal cutoff of 25 hits per annoDB per sequence. (14Oct18)
      The "--masking 0" option has been added to the Diamond TCW default parameters, as tests show that good hits were not being reported (this reduction of false-negatives does increase the false-positives a little).
    • When adding annoDBs, the search program option of "TCW Select" is no longer available. If Diamond path in in the HOSTS.cfg file, it is the default. Diamond 0.9.22 was tested.
    • The ability to use Legacy blast has been removed.
    • hitResults: The results are now written to the project directory called "hitResults". Previously, they were written to the directory called "uniblasts"; if you have an existing directory of this name, you need to rename it "hitResults" or remove it.
    • The overview has multiple changes in presenting statistics on the results. (14Oct18)
  2. ORF Finder: Add the ability to use the %Similarity or E-value for determining what hits can be used to determine the ORF frame. Also, it was ignoring the hit frame if there were multiple hit frames for the sequence; it no longer does that.
  3. Bug fix: When adding remarks, if there was a tab in the remark, it would not display right; hence, tabs are changed to spaces.
  4. User Remark: The TCW remark and User Remark are now separate, so when you "Add Remarks", it goes into User Remark. (21Oct18)

viewSingleTCW

  1. The "Blast" option is changed to "Find Hit", and:
    • The ability to use Diamond has been added.
    • Sequences can now be searched as follows:
      • For NT-sTCW: (1) the nucleotide sequences in the database, (2) the translated ORFs from all sequences, or (3) a user selected protein database.
      • For AA-sTCW: (1) the proteins sequences in the database, (2) or a user selected protein database.
    • The user can input any set of parameters.
    • A paste from clipboard button has been added. The search commands will only be written to the terminal if the "Trace" label is checked.
  2. Sequence table: Add "Export hit sequences for table", which will write a file of all Best Eval and/or Best Anno hits from the table.
  3. Sequence columns: Changed #Taxonomy to #AnnoDB. (14Oct18)
  4. Sequence Frame: If the Best Eval and Best Anno frames are different, the hit information for both of them is shown.
  5. Filters have been added on (1) the number of taxonomies that has a sequence has hits for, (2) The taxonomy and/or DBtype of the best eval or best anno per sequence.
  6. Basic Hit:
    • The number of unique sequences is shown along with the number of unique hits.
    • There is a new filter on Rank=1 and a column for Rank. There is also a new filter on "Hit-align". (14Oct18)
  7. Basic Sequence: Added "User" remark search. (21Oct18)
  8. Overview: (14Oct18)
    • The AnnoDB table has multiple changes to the statistics.
    • For "Cover>=50" and "Cover>=99", the N can be changed with "viewSingleTCW <project-name> -o -o1 N -o2 M", which recompute the overview.
  9. Sequence Columns: the #SwissProt, etc now uses integer sorting instead of string sorting. (14Oct18)
  10. Bug fix: The #Seq column did not sort in the Sequence table, and the #Pair column did not sort in the Pairs table.

runMultiTCW

  • It was the case that NT blast and statistics were disabled if there was even one AA-sTCWdb as input. Now, if there are at least two NT-sTCWdb as input, then functions are allowed.
  • hitResults: The results are now written to the project directory called "hitResults". Previously, they were written to the directory called "blastResults"; if you have an existing directory of this name, you need to rename it "hitResults" or remove it.

viewMultiTCW

  • The NT blast and statistics are available if there is more than one NT-sTCWdb, regardless if there is an AA-sTCWdb.
  • From the sequence table, MAFFT has been added for multiple alignments. (21Oct18)

runAS

  1. Fixed a recently added bug in the "TCW.anno" function.
  2. The "TCW.anno" writes the date for each saved annoDB.

TCW Version 2.8

Release 21Aug2018, Update 5Sept2018 (this includes a fix to a very stupid recent bug to runAS)

Improvements to MulitTCW: For existing mTCWdbs, update the statistics as follows: Select your project with runMultiTCW. Use the "Remove..." option to remove the Pairs and Clusters. Add the Pairs and Clusters and "Run Stats".

runMultiTCW

  1. The AA-BBH pairs are computed when the blast file is loaded, and used by the BBH clustering routine. A new column was added to the MySQL schema for this, and is displayed in viewMultiTCW. Note: there is no corresponding columnt for NT-BBH as it is rarely used, hence, computed on the fly.
  2. Build Database: speedup for loading uniprots.

viewMultiTCW

  1. Sequence table: add Pairs option to show all pairs for selected sequences in the Pairs table. 4Sept2018 - fixed a bug.
  2. Sequence filter: add a filter on datasets, i.e. show all sequences from a given dataset.
  3. Pairs table: Add a new column to indicate the BBH pairs (though they may not be in BBh clusters if they do not pass other parameters). Also, run the "Table Stats" in the background.

runAS

  1. It is no longer possible to get passed GO tables from the GO website. Hence, that option has been removed.
  2. Speedup for building the GO database.
  3. 5Sept2018 - replaced the 'Download' labels with 'Build' since the function performs more than a download.
  4. 5Sept2018 - BUG fix - incorrectly made the directory name for the full download of SwissProt or TrEMBL

runSingleTCW

  1. Some buttons have been moved to more logical places and 'Run DE" was removed (should always be run from command line).
  2. Speedup on adding the GO information.

TCW Version 2.8 (mTCW database 5.8) 21Aug2018

Release 21Aug2018, Update 5Sept2018 (this includes a fix to a very stupid recent bug to runAS)

Improvements to MulitTCW: For existing mTCWdbs, update the statistics as follows: Select your project with runMultiTCW. Use the "Remove..." option to remove the Pairs and Clusters. Add the Pairs and Clusters and "Run Stats".

runMultiTCW

  1. The AA-BBH pairs are computed when the blast file is loaded, and used by the BBH clustering routine. A new column was added to the MySQL schema for this, and is displayed in viewMultiTCW. Note: there is no corresponding columnt for NT-BBH as it is rarely used, hence, computed on the fly.
  2. Build Database: speedup for loading uniprots.

viewMultiTCW

  1. Sequence table: add Pairs option to show all pairs for selected sequences in the Pairs table. 4Sept2018 - fixed a bug.
  2. Sequence filter: add a filter on datasets, i.e. show all sequences from a given dataset.
  3. Pairs table: Add a new column to indicate the BBH pairs (though they may not be in BBh clusters if they do not pass other parameters). Also, run the "Table Stats" in the background.

runAS

  1. It is no longer possible to get passed GO tables from the GO website. Hence, that option has been removed.
  2. Speedup for building the GO database.
  3. 5Sept2018 - replaced the 'Download' labels with 'Build' since the function performs more than a download.
  4. 5Sept2018 - BUG fix - incorrectly made the directory name for the full download of SwissProt or TrEMBL

runSingleTCW

  1. Some buttons have been moved to more logical places and 'Run DE" was removed (should always be run from command line).
  2. Speedup on adding the GO information.

runDE

  1. EdgeR.R is replaced with edgeRclassic.R and edgeRglm.R.
  2. 5Sept18 - Made CPM (count-per-million) the default filter and made the parameters behave like edgeR. This will give the same result as edgeR cpm, e.g. keep <- rowSums(cpm, y, normalized.lib.sizes=FALSE) > N) >= M where N and M are runDE parameters.

Demos - the count files have been changed; they are not compatible with existing demo sTCWdbs. The Demo annoDBs have been updated, and now work for both the demo and ex (example) projects. The demo GO tables are part of the packages so they do not have to be downloaded.

MySQL connection - a change was made to the way TCW connects with the database, that speeds up the building of TCW database (GO, sTCW, mTCW) on some machines, depending on configuration.

TCW Version 2.7 (mTCW database 5.7) 8Aug2018

Improvements to MulitTCW.

runMultiTCW

  1. The mysql database schema has been altered so that all percentages are stored with more precision. Your database will be updated the first time you access it with runMultiTCW or viewMultiTCW. However, you need to re-run the statistics (as explained above) to get the new precision.
  2. Overview:
    • The percentages were for the whole dataset, e.g. the percent of exact codons from all codons. This has been changed to be the average of the individual percentages, e.g. the average percent codon from all pairs. This makes the numbers consistent with the new 'Columns Stats" in viewMultiTCW, which also provide standard deviation, median and ranges.
    • The KaKs quartiles were incorrect in some cases, plus changed the p-value table for the overview. The standard deviation method was changed from population to sample calculation.

viewMultiTCW

  1. Column Stats: A new option has been added to all three tables to show the averages, etc of all numeric columns shown. For the Pair Table, the option is on the 'Show' pull-down. For the Cluster and Sequence tables, the options is on the "Tables" pull-down.
  2. A new 'Explain' button is on the overview page to explain how the different values were computed.

runSingleTCW

  1. The input count file (i.e. expression counts) can have decimal numbers for the counts.

TCW Version 2.7 (mTCW database 5.6) 28July2018

Improvements to MulitTCW.

runMultiTCW

  1. The proteins sequences for an NT-sTCW are now created when the sTCWdbs are loaded; this makes the requirement of having an input protein file obsolete.
  2. Save pairwise alignments in database so can recreate the statistics from the alignments.
  3. The last release had some loss of accuracy in the statistics in order to fix a problem; the accuracy has been restored with the stored alignments.
  4. Improve summary and document how the statistics are computed in doc/mtcw/summary.html.
  5. The GC and CpG statistics have been changed to use the Jaccard Index (intersection/union).

viewMultiTCW

  1. If an AA-mTCW( i.e. at least one AA-sTCW was input), no NT filters, columns or options will be shown.
  2. If there are no GOs in the mTCWdb, GO columns and options will not show on the interface.
  3. Bug fix: the 'Copy Cluster ID' on the Cluster table did not work.

runSingleTCW: conditions names are shown in viewSingleTCW in the same order as from the input file *they use to be sorted).

TCW Version 2.6

Improvements to MultiTCW. Most improvements are in the 6/22/18 release, the rest are dated. The latest releast is 7/16/18.

runMultiTCW

  1. Major change: The coding statistics were based on the CDS nucleotide alignments; this has been changed to use the AA alignment and retro-fit the codons to the alignment; this produces better coding statistics when there are gaps in the alignment. This AA-NT alignment works with the TCW AA files, but not with ESTscan.
  2. Major change: The BBH method has been extended to work in the following two modes when there are more than two input sTCWdbs:
    1. Select the sTCWdbs to be used as input to the BBH clustering routine. If there are more than two selected, it will create mutual BBH clusters, e.g. if three sTCWdbs are selected, it will create clusters of size 3 where they are all best hit with each other.
    2. If no sTCWdbs are selected, then it will run the BBH for all pairs of sTCWdbs in the database.
  3. mTCW was not outputing pairs for KaKs analysis if their alignment had more than 10 gaps. There is no longer this restriction since the user can filter via the viewMultiTCW interface.
  4. Added the column "minPCC" for the minimal PCC value of a group, where the PCC is computed on the RPKM.
  5. A check has been added to make sure that names in the AA file match the database seqIDs
  6. The maximum allowed size of the method prefixes has been changed from 3 to 5.
  7. The search program for the self-blast can be set to diamond or ublast (if they exists); the code has been altered to save this setting in mTCW.cfg.
  8. Bug fix: If the mTCW database has datasets that have a mix of upper and lower case start characters for the sequence names, there was multiple problems due to the default sort in Java and MySQL being different.
  9. BugFix: Some of the %PCC values for groups were wrong.
  10. The Summary for Pairs has more information, which is computed when the pairs are added. (7/1/18)
  11. Improvements for self-blast "Settings": (7/1/18)
    1. The filter option has been removed because it was not useful.
    2. A "Cancel" has been added.
    3. The parameters are saved to the mTCW.cfg file.
    4. Made it more user-friendly.
  12. Database optimization: (1) Added index to the pair table. (2) Remove extras from the Unique Hits tables. Both of these changes will be made on existing databases the first time they are viewed. (7/5/18)
  13. A "Add GOs" button has been added so this step is separate from building the database; this is done because it takes a long time (e.g. a couple hours on a large database), hence, the user has the choice of if and when to add them. (7/5/18)
  14. On the Overview, the Cluster Set table of counts has been slightly changed to be more meaningful. (7/5/18).
  15. Major change (7/16/18): The overall statistics have been changed:
    1. They were only correct if only BBH clusters were in the database. This has been fixed.
    2. The user has the choose of what cluster set pairs are used for coding sequence statistics and KaKs.
    3. The CpG and GC statistics have been removed from the overview and the computed statistics have more round-off error then before (both these issues will be fixed in a later release).
    4. Changed the Statistics "Settings" to give more control over when the KaKs files were written.

viewMultiTCW

  1. In the CDS alignment view, the AA-NT alignment is used as described in (1) above.
  2. Added "and" and "or" options on Cluster Set filter for Pairs and Seqs
  3. Added Prev/Next on Pairs table when selected from the Cluster table so one can step through the pairs of a cluster table.
  4. Improvements to column names and organization, and clarified jargon of the interface.
  5. For when there are more than two datasets in a mTCW database, a new Pairs filter allows the datasets to be selected.
  6. The sequence detail panel has multiple little improvements.
  7. The sequence table showed DE values <1 or >1 in a weird way; now they are all shown 'as is' except 3/-3 are shown as "-".
  8. Bug fix: Some of the Pair queries did not work.
  9. For the Pairs, it no longer shows statistics queries and column if there are one or more protein sTCWs as input (statistics are not computed in this case). (7/1/18)
  10. Tool tips that show in the lower left hand corner as attached to all queries. (7/1/18)
  11. The "Pairs with:" pairs filter has been changed to list the sTCWdb pairs. This was done to speed up the query, but it can still be slow on large databases. (7/1/18)
  12. The SQL query for the sample tables are only computed on the first viewing of a database, and retrived from the database there after. (7/1/18)
  13. The PairID is shown on the Sequence Detail panel, which can be search on in the Pair Filter. (7/5/18)
  14. The Results List has a "Remove Selected" added and the panel has a more informative layout. (7/5/18)
  15. The number of filtered row that will be downloaded are shown before the download from mySQL begins. (7/5/18)
  16. The Pairs table has a new option to "Show Stats" which will show the summary statistics for the pairs in the table that have coding statistics and KaKs (7/16/18).
  17. The Pairs table allows the selection of multiple lines followed by "Sequences"; this allows the alignment of user selected pairs. (7/16/18).

runDE
If a DE value was >1 or <1, is was being put into the database as 2 or -2. It now keeps its original value.

runAS
The TCW.anno option, which writes the information to file for runSingleTCW now includes writing the GO database name. (7/1/18)

runSingleTCW

  1. Bug fix: if an sTCW database was only annotated with nucleotide annoDBs, the ORF finder failed.
  2. The Import Annodbs will load the GO database name along with the annoDBs. (7/1/18)

TCW Version 2.5

Release dates 5/3/18 through 5/31/18: The May 3rd release had major changes to the ORF finder, including replacing the Hexamer score with the 5th-order Markov model score. The subsequent releases involved incremental changes to the ORF finder, which are described in TCW ORF finder.

If you have existing sTCW databases, you can simply download the jar files from here and follow the instructions to put them in your TCW_2/java/jars directory. Then for each sTCW database that needs updating, execute:

   ./execAnno project_name -r -n

Major changes for v2.5:

runSingleTCW ORF finder

  • The hexamer score has been replaced with a 5th-order Markov model (see Hass 2013, Nature Protocals 8:1494).
  • The sequences with multiple hit frames are evaluated to determine whether to use the best hit for the frame selection.
  • The rules for selecting the best ORF have changed a little (see ORF finder).
  • Various changes to output file names and their content.
  • Changes to the remark assigned to the sequence by the ORF finder and the summary information (additional changes on 8/21/18).
viewSingleTCW
  • The Sequence Detail hit table has two new columns to show the percent overlap of the sequence and the hit.
  • The Sequence Detail Frame display has been changed. It no longer highlights codon or hexamer usage. Changing the "ORF/Nt" pull-down to "Scores/AA" shows the 6-frame scores for the current displayed ORF. The CDS region can be highlighted, and the hit region can be highlighted using italics or blue font.
  • The Sequence Alignment panels provide options for the UTRs and Blast Hit region to be highlighted (added 5/21/18).
Bug fixes:
  • runSingleTCW: Fixed a bug in Remove Remarks, where the option to remove only TCW-added remarks did not work (5/12/18).
    Fixed a bug where the "Exec GO only" was running the entire annotation (5/21/18).
  • runMultiTCW: Fixed a bug where it sometimes was not possible to select the "NT" blast (5/12/18).
  • viewSingleTCW: Fixed a few problems that caused errors, but nothing serious.
    Fixed a bug in the Sequence Frame panel where on a rare occassion the wrong hit coordinates were used (5/21/18).
    Fixed a bug in the Sequence GO panel where the assigned GOs for the selected hit were not shown if there was <4 (5/31/18).

TCW Version 2.4

2/20/18 - upgrades to the annotations

RunSingleTCW:

  • The annotation assigns a 'BestEval' and a 'BestAnno', which now work as follows:
    1. BestEval - the hit with the best E-value and best bitscore.
      Previously, it just used the E-value, where the E-values may be the same between two hits but large differences in the bitscore.
    2. BestAnno - a hit is marked as having 'good annotation' if it is (1) SwissProt or (2) does not have phrases in its annotation such as 'uncharacterized protein'. The hits marked as 'good annotation' are sorted by E-value and bitscore and the best one assigned as BestAnno.
      Previously, there was a restriction that the BestAnno had to have an E-value close to the 'BestEval', that restriction has been removed.
    You can view all hits per sequences in ViewSingleTCW along with their description, E-value, and bitscore.
  • The ORF finder has a few improvements:
    1. ORFs now computes for sequences less than 30bp in length (it previously did not).
    2. ORFs with Start/Stop codons are given more preferences over longer ORFs without Start/Stop.
ViewSingleTCW:
  • For the Sequence Detail view, the bitscore is shown instead of the rank for hits.
  • The Export command for the main sequence table did not work if there was no GOs, that has been fixed.

TCW Version 2.3

2/6/18 - this release is mainly on upgrading the search functions.

RunSingleTCW:

  • The diamond tabular file has a ".dmnd.tab" suffix, usearch has a ".usch.tab" suffix, and blast has a ".tab" suffix.
  • The log file is appended too instead of keeping the last 10 old copies.
  • The mysql command for creating the species table would hang when there were too many hits - its been broken down into smaller queries.
  • Bug fix: The ORF finder was using NT hits, which caused bad ORFs; it now only uses AA hits.
RunMultiTCW:
  • Diamond and userach can now be used for amino acid self-blast.
  • The logs directory has a file fo reach action, which is appended to each time the action is executed.
  • Small bug fixes with mixed (NT and AA) databases: (1) The NT blast would be executed, even if the interface indicated it would not be. (2) The "Add" was disabled if the database was removed.
Searching for both sTCW and mTCW:
  • Usearch was tested on Mac and Linux 10.0.240 32-bit (1/31/18). The blast-style tabular output of this version had some differences from blast and diamond, so TCW was modified for it.
  • Diamond was tested on Mac and Linux v0.9.17.118 (2/2/18). The "-sensitive" option was used in the defaults, but this can cause it to take a long time so has been removed as a default; however, the user can add it back on the parameters window.
  • Blast was tested on Mac and Linux NCBI 2.7.1 (where the last release is 10/3/17). The "-task megablast" has been added for the blastn command, which is desirable for closely related sequences; the user can remove this on the parameter window.
  • Improvements for catching and reporting errors from the search programs.
viewSingleTCW: The "DB type" on the sequence detail window was always zero, which has been fixed

TCW Version 2.2

Update: 1/22/18 - A recent download of UniProt had a few unknown evidence codes that caused runSingleTCW to fail on adding GOs - this has been fixed.

Update: 12/20/17 - made it more user-friendly if the search failed.

First Release: 12/12/17

runSingletTCW

  1. Upgraded to work with the most recent Diamond release (0.9.13.114, downloaded 12/10/17)
  2. Add "Copy" project.
  3. Add "Remove blast files from disk" to the "Remove..." menu.
  4. Small interface cleanup and more checking for input error

viewSingletTCW

  1. Basic GO: add 'domain' to the columns for the table
  2. Save column selection for all Basic Searches, i.e. if the user changes the column, the change will be reflected on the next time viewSingleTCW is run to view the project.

runMultiTCW

  1. The Run Stats "Settings" options were confusing -- made them more obvious.

viewMultiTCW

  1. A button has been added on each table to "Clear" the current column selection.

TCW Version 2.1

Second release 6 Nov 17: fixed problem where is was not working with JDK v9.
First release 25 Oct 17

viewSingleTCW:
  1. N-fold: The N-fold filter has been re-written to have the same look as the N-fold column, and be easier to use. Also, a bug was fixed where if the divisor was zero, the N-fold pair was not shown in the table. The N-fold column now sorts like the DE columns by absolute value.
  2. Decimal numbers:The ability to change how decimal numbers were displayed has been moved from the Sequence Table Column panel to its own panel, which is shown in the upper left. This change was made because the formating is used for all tables. Additionally, the formating options have changed to give more flexibity and be simipler to use.
  3. The percentage of sequences in table is now displayed at the top of the table.

TCW Version 2.0

Date 23 July 17

This is a major release for the multiTCW, though there are also a few significant changes for singleTCW.

runMultiTCW

  1. The interface is much cleaner -- too many changes to list.
  2. Add Pairs is a separate step, which adds all pairs that have a blast hit, which can be queried in viewMultiTCW
  3. Add statistics to pairs is a separate step. This is relevant when the singleTCWs were created from transcripts (i.e. DNA). The statistics include:
    • Synonymous, nonsynonymous, and degenerate codons.
    • Transitions and tranversions.
    • CpG sites and GC content.
    • Ka/Ks values, where are obtained from KaKs_calculator (Zhang et al. 2006). The pairs file is written to disk, the user runs KaKs_calculator on the file, then had runMultiTCW read in the results.
  4. The GO (gene ontology) are imported into the database on creation.
  5. The Transitive clustering routine has been replaced with Closure clustering, which guarentees that all sequences in a cluster have a blast hit with all other sequences in the cluster, and that each sequence has a user supplied overlap and similarity score with at least one sequence in the cluster.
  6. Some improvements to the BBH clustering routine.
  7. Two new example projects are included that have good homology. They are referred to in the mTCW UserGuide.
viewMultiTCW
  1. Pairs:
    • Add new Pairs Table.
    • Add filters on blast scores, and all statistics stated above..
    • Provides alignment of both AA, NT, CDS for pairs.
    • Improved the codon alignment algorithm and alignment display in viewMultiTCW.
  2. Cluster:
    • Add link to pairs and sequences tables, i.e. all pairs in the cluster are shown in the pairs table, and all sequences in the cluster are shown in the sequence table.
    • The RPKM and DE filters have been removed, as they were pretty meaningless. The new links to the sequences allow easy viewing of these details.
  3. Sequences - added filters:
    • Cluster methods
    • RPKM values
    • Sequences with Blast hits to different set (i.e. not from same singleTCW)
    • Has Annotation or has GO.
  4. Export on all three tables now includes exporting GOs and sequences.
  5. All three tables have 'Copy' button to copy various information to clipboard.
runSingleTCW:
The ORF finder has been updated. To use it on an existing sTCW database, just run ./execAnno <project_name> -r.
  • The ORF finder use to use the longest ORF that agrees with the hit frame, where the ORFs were basically the same as found with the NCBI ORF_finder. When working with the BBH in multiTCW, it became clear that for de novo transcripts, when there is a hit to the sequence, it is better to use the hit ends -- which is what it does not.
  • ORFs no longer will contain strings of n's.
  • The names of the ORF files have been changed in the ORF directory, and it writes the file of proteins (translated best hit) to th projcmp/AAfiles directory for easy use in runMultiTCW.
  • More elaborate GC statistics are computed to be shown in the overview.
viewSingleTCW:
  • Overview: Change annoDB <40% to Eval>=50 and Total>=50 Includes the average length of UTRs and CDS, and GC and CpG for UTRs and CDS
  • Basic GO query: Added Show 'Sequence - best hit with GO'. This helps to understand how the best-evalue was assigned
  • Basic Hit query: Added column of all GOs and column of #GOs for hit. Added 'Show all assigned and inherited GOs for hit'
  • Export on Main Sequence table:
    1. Allow appending to existing file.
    2. GO: add term_type filter. Add option for 'per sequence' and 'overall', where the first applies the evalue to each GO-seq bestEval and the second applies the evalue to the overall GO bestEval
    3. Remove writing PCC files.
    4. Check for write access (may not have it from Applet).
Bug fixes in singleTCW:
  • Even when downloading the GO mysql on the same day as UniProt, I have had them be out of sync, which caused an error in runSingleTCW; this has been fixed.
  • Overview of viewSingleTCW: the coverage was wrong.
  • Basic GO Query: if #Seqs selected, E-value filter was ignored.
  • Pair alignment: gaps were not shown in graphic view (bug just in last release)

TCW Version 1.6.8 (03 Jan 17)

  1. runMultiTCW
    1. The user interface is much easier to use, and better error messages.
    2. The user can now have mTCW run a self-blast on the protein or DNA sequences to be used for clustering.
    3. The BBH (Bi-directional best hit) has now been added as a clustering method. It only works well if there are only two datasets being compared since it only allows clusters of size two, i.e. they are the reciprocal best hit over all hits.
    4. BBH and Transitive can use the protein or DNA blast files for clustering.
  2. viewMultiTCW
    1. The user can view the protein or DNA alignment. They can also click on an alignment to see the text form in the canonical multi-row format.
  3. viewSingleTCW
    1. Basic Hits: (1) Added a 'Show' button so show all columns for the selected row. (2) Added seqStart, hitStart, seqLen, and hitLen columns. (3) Add a copy to clopboard for the descripton or sequence of the selected hit. (4) Fixed a bug where the nucleotide alignment did not work if one sequence was upper case and the other lower case.
    2. Export: The tables now have column headings writen as first row, and the output from the different interfaces is more systematic.

TCW Version 1.6.7 (29 Nov 16)

  1. Sequence Pairs:
    1. TCW has an option to compare sequences in the database; this has been improved.
    2. The interface has more options for viewing Pairs.
  2. Basic GO:
    1. The DEtrim features has been added back.
    2. An 'Add to table' has been added.
    3. The ability to delete rows has been added.
    4. Basic GO has new options that act on the entries in the GO table:
      1. Show All ancestors - shows the ancestors for the the entries in the GO table.
      2. Show Longest Paths - determines all paths for all the entries in the GO table, then removes ones that are contained in a longer one.
      3. Export All Ancestors, Export Longest Paths, Export All Paths - writes the respective information to file.
  3. An 'Add to table' has been added to the Basic Seq and Basic Hit panels.
  4. An 'Align' option has been added to the Basic Hit panel; this can show the alignment of multiple sequences to a hit.
  5. Overview: Fixed a little bug on the percentages of GO DE terms.

TCW Version 1.6.6 (28 Oct 16)

  1. runSingleTCW: some additional changes were necessary to accommodate the changes to nr.gz, as a few things in viewSingleTCW did not work. To get the fixes, reload the annotations; i.e. ./execAnno <project> -q (this deletes the existing annotations, and reloads from the existing blast .tab files). Also, some slight changes were made to the rules for selecting the best annotation hit.
  2. viewSingleTCW:
    • Basic Hit Query: a filter on percent similarity was added, along with the column for it and aligment length.
    • For both Basic Hit and Basic GO, the query form was made more intuitive.
    • Sequence Details: a column was added to the Hit table to indicate the TCW selected Best E-val, Best Anno, and Best GO.

TCW Version 1.6.5 (10 Oct 16)

This release adds evidence codes (EC) to the single TCW database; it is necessary to run "Exec GO Only" from runSingleTCW to get the codes; no error occur in viewSingleTCW if the EC have not been added.
  1. runDE:
    • The p-values can be read from a file versus being computed.
    • The overview can be updated from the runDE interface.
    • An 'Exit' button has been added which will also exit R.
    • A tiny bug has been fixed for the "GOseq" execution -- except for an unusal situation, it will only have minor effect on the p-values.
  2. runSingleTCW:
    • The evidence codes (EC) have been added.
    • The format for NCBI nr database changes; TCW has been updated so that it will read the new or old format.
  3. viewSingleTCW - Basic GO Query:
    • The evidence codes can be queried from the "Basic GO" interface.
    • An additional filter has been added on the number of Sequences (gene products) associated with each GO.
    • The maximum number of GO levels was 16; its now dynamic and may go above or below this number.
    • Multiple GOs can be selected in order to view the sequences associated with all selected GOs.
    • The DE trimmed feature is currently disabled, as recent changes have broken the algorithm.

TCW Version 1.6.4 (21Sept16)

  1. viewSingleTCW:
    • Status output has been added to the Blast page.
    • Basic Hit Query, the annoDB option: it was the case that only one or all annoDBs could be selected; now, any subset can be selected.
    • Basic GO Query: query on GO Slims has been added.
    • Sequence Details: an option has been added to view all assigned and inherited GOs.
  2. runSingleTCW:
    • GO slims can be added from the GO database or from a user supplied OBO file.
    • When building the database, indicies have been added for the Hits so that queries for the Basic Hit tend to run faster.
    • For annotation: If the tabular search file is supplied, the corresponding FASTA file may be zipped (as before, it can also be zipped for diamond, but not for blast).

TCW Version 1.6.3

The release is on changes to viewSingleTCW.

(4Sept16)

  1. Fixed two bugs: (1) viewSingleTCW would not startup right if there was only one library. (2) Could not view a RPKM or DE column in Basic Hits if there was no GO annotation.
  2. GO Annotation: (1) Basic GO Annotation: There are more ways to look at ancestor and descendants. (2) Sequence Detail: The hits with inherited GOs can now be viewed. (3) Rearrangment of the GO query panel to make it more logical.

(16Aug16)
  1. Basic Hit Query: filters have been added for RPKM and DE p-values, i.e. the hits that pass the filters must have at least one sequence that passes the RPKM and/or p-value filter.
  2. All queries result in a description of the filter used, which is placed over the table.
  3. Basic Sequence: this has been simplified.

TCW Version 1.6.2 (8Aug16)

The release is on changes to viewSingleTCW.
  1. The tabs for Sequence Panels are positioned under their respective list instead of all under "Show All".
  2. Basic Hit Queries: The Limit entry box is removed, and the 'Best Eval' and 'Best Anno' checkboxes are moved from Attributes to the front panel. A new columns is added called '#Best' to indicate whether the hit is a assigned the best hit for any sequence that aligned to the hit; this will always be >0 if 'Best Eval' or 'Best Anno' is selected.
  3. All searches are queued so that TCW is not frozen during database retrieval.

TCW Version 1.6.1 (16July16)
The release is on changes to viewSingleTCW.

  1. All writing to disks has been removed except for writing to Java Preferences, where TCW will not fail if the user does not have write permission.
  2. The columns panel for Sequences has been condensed.
  3. Some features were removed that were mainly useless.

TCW Version 1.5 Release dates from 24 March 2016 to 7 June 2016
The major changes are: (1) runAS provides a graphical interface for the annotation setup. (2) New demo files. (3) More options for querying GO in viewSingleTCW.

viewSingleTCW

  1. The Sequence Results table provides a description of the filters applied for the corresponding table.
  2. The Export on the Main Sequence Table did not always work on some machines, that has been fixed.
  3. Basic Sequence and Basic Hit Query have been greatly modified for clarity. The Species selection for the Basic Hit Query has been improved.
  4. An option to view the GO paths for a selected GO has been added.
  5. The GO List and GO Tree outputs are clearer, and the GO Help has been added to.
  6. Added an option to export the replicates for all seqIDs in the sequence table.

runAS

  1. A Java interface has been created for the "Annotation Setup". It guides the user in downloading the UniProt taxonomic database and building the GO database. It replaces the original Perl scripts, and is faster and uses less memory. See Annotation Setup (12 April 2016).
  2. runAS (Annotation SetUp) has been updated to provide better messages and use less memory (29 April 2016).

runSingleTCW has been updated to provide much better error messages and some small bugs were fixed.

New demo files with recent annotations. The old ones work fine, but the annotations are 4 years out of date and the sequence quality is not as good. Plus, the new demos include (1) a protein sequence demo, and (2) quality values for the mixed Illumina transcripts and Sanger assembly.

Release 14 March 2016 -- New GO Features in viewSingleTCW
To use the new features on pre-March 14th built TCW databases, it is necessary to
(1) update the TCW GO tables (./execAnno <database name > -G) and then (2) rerun the runDE GOseq option.

  1. Basic GO annotations display: has multiple new options to show information about a selected GO, i.e. showing the ancestors as a list, ancestors as a tree, and descendants. It also has a few more count columns in the table to distinquish hits that have been directly assigned to a GO in the UniProt files versus hits that are descendants of a GO, hence, are inherited.
  2. Basic AnnoDBs display: has an option to show the GOs assigned to a selected hit.
  3. View Sequence: has options to show hits, ancestors or tree associated with a selected GO.
  4. A new best hits column: the Best Eval and Best Anno do not always have GO annotations. The new "Best Hit With GO" is the best e-value with GO annotations.
    • Select on "Columns".
    • Filter under "Best Hit".
    • Displayed in Sequence details.
  5. Sequence details shows differential expressions values.
Additionally, the GO tables in the TCW are significantly smaller (30%).

TCW Version 1.4

Release (1 Mar 2016)
This release has changes for exploring the details of a selected sequence from the sequence table.
  1. The options for the sequence detail page are restructured for clarity (Detail, Frame, Go, Hit Alignment).
  2. For hit alignment, a new display is available to show the alignment in a format like UniProt uses, i.e. multi-line where a "+" is used for synonymous match, the amino acid is shown for exact match....
The version works on previously built v1.4 TCW databases.

Release (8 Feb 2016)
To use the new features on previously built TCW databases, it is necessary to
(1) reinstall the GO database (i.e. newGOver.pl), (2) update the TCW GO tables (./execAnno <database name > -G) and then (3) rerun the runDE GOseq option.

  1. Add GO and Interpro:
    1. Basic Query Sequence - add best GO and Interpro as columns
    2. Basic Query annoDB - add GO and Interpro as columns and filters fot these to the Attributes
    3. Main Table Column - add as columns for Best Eval and Best Annotation
  2. Basic Query GO
    1. A filter has been added for e-value, where the e-value assigned is the best of all UniProt-Sequence hits that contain this GO.
    2. For a selected GO, the following options are available:
      • Show....
        	all hits mapped to the GO (assigned only)
        	all hits mapped to the GO (assigned or child)
        
        This display shows the evidence code for assigned hits.
      • Copy GO Term to clipboard
      These options are explained in detail on the Basic Query GO Help page.
  3. Sequence Details -- View GOs (optional selected):
    1. If no selected hit, show union of assigned hits with evidence codes
      A "Hits for selected" button allows the user to see all the {HitID, e-value, EC, %sim) for the GO.
    2. If selected hit, show assigned and ancestor GOs, and all assigned Interpro, enzyme EC, KEGG, and Pfam identifiers.
  4. Overview contains GO version and a table of GO p-values.
  5. Changed Basic annoDB Hit:
    1. Add 'Load File' so a list of hit identifiers or descriptions can be loaded together (the Load File is also on the Basic Query Sequence page).
    2. When a search string is entered, check "Use Filters" if the filters should be used in addition to the search string.
    3. Default in Attributes changed from Best Eval to Best Anno
  6. Bug fix:
    1. Blast feature: if there were n's in a nucleotide sequence, and was assumed to be protein.
    2. Columns KEGG, EC and PFam: if one of these was selected for both Best Eval and Best Anno the same identifier was shown in both columns.
Changes to UniProt/GO installation scripts:
  1. newGOver.pl - add GO version to the local go database and add GO and InterPro to TCW UniProt table to the local go database used to build TCW GO tables.This script needs to be rerun, and the TCW go tables updated (./execAnno db -G)
  2. newUPfull.pl - The full .dat is not downloaded with the newUPver.pl since it is so big and only a subset of it is typically used. However, that means there will be no GOs for the subset; this script downloaded the .dat files. It should be run after newUPver.pl and before the newGOver.pl is run.
Next release: there will be another release with more GO improvements within a month. Three improvements will be:
  1. the MySQl hit-GO tables can be very big as it contains both assigned and ancestor relations; this will be reduced to just assigned.
  2. an improved view for the GO tree
  3. a filter will be included for the evidence code.

TCW Version 1.3

Version 1.3.9 (10 Dec 2015)
  1. runDE:
    • Upgraded DESeq to DESeq2. EdgeR works with their latest release (Oct 2015).
    • For the built-in EdgeR and DESeq2, their p-value adjusted values are used, so the checking TCW FDR is not necessary.
    • EDASeq has been removed as it can be executed with an R-script if desired.
  2. Terminology has been updated to reflect current practices.

Version 1.3.8 (30 Nov 2015 - 5 Dec 2015)

  • This release had a major feature added to the runDE program, which computes differential expression using published DE methods that execute in the R environment. The changes are as follows:
    1. It was the case that there were three methods available, however, they can get out-of-date and new methods published, and there was no way for the user to change them without changing the TCW code. In this release, there is a new option to supply an R-script. The needed values (e.g. matrix of counts) are written to the R environment, the R-script is run using the supplied variables, and the results are read and entered into the TCW database. runDE has an option to filter sequences that have low read counts before computing DE.
    2. The interface has been restructured for clarity.
    3. Can now run "runDE <database>", which by-passes the sTCW database chooser.
  • This version works with Diamond release 4/2015.
  • Some oddities in the runSingleTCW were cleaned up.

Releases 2 Oct 2015 - 8 Nov 2015)
This release contains further enhancements to the ORF finding algorithm. It computes codon and hexamer usage from the regions of sequence that have good annotation hits, and uses their log-likelihood ratio to select between two similar length candidate ORFs (when there is no annotation hit). It has a new display in viewSingleTCW to show the codons of a sequence in any of the 6 frames. More information can be found in TCW ORF finder.

  1. viewSingleTCW:
    1. For location, the group could be 'scaffold', 'chr', etc; these are generally followed by a number. A column was added that contains the number only, which allows numeric sorting.
    2. Added a Filter for sequences that have a location.
    3. Speeded up some operations in the Basic searches.
    4. A column has been added to viewSingleTCW that is the count of the n's within a sequences.
    5. The n's can be viewed with the "Show Sequence by Frame" (upper left pull-down from the sequence detail page.
  2. runSingleTCW:
    1. For 'Add Remark', where remarks can also be removed, added the ability to remove all remarks except TCW added remarks.
    2. The options for the ORF finder are incorporated into runSingleTCW. The ORF finder options are shown on the viewSingleTCW 'Overview' at the bottom.
    3. Slight redesign to the runSingleTCW interface and the annoDB Options menu for clarity.
    4. Input fasta files may now have comment lines starting with '#'.

Version 1.3.5 (5 Sept 2015)

This release has improvements for determining the best reading frame, where the frame with a protein hit is given precedence.

Version 1.3.4 (7 Aug 2015)

  1. The code has been restructured to be clearer. The only impact this may have on users it that the applets should now reference:
    1. stcw.jar: CODE="jpave.query_interface.JPaveApplet" => CODE="jpave.viewer.STCWApplet"
    2. mtcw.jar: CODE="cmp.main.CPaveApplet" => CODE="cmp.viewer.MTCWApplet"
    And there are three jars instead of two: stcw.jar (viewSingleTCW), runstcw.jar (runSingleTCW) and mstcw.jar (viewMultiTCW and runMultiTCW).
  2. The computation of the Best Eval is now strictly the best e-value (it did have some logic to get best annotation with slightly less-good e-value). The term 'unk' has been added to the terms ignored for 'Best Annotation', as its the default description when none is provided.
  3. viewMultiTCW counted the percentage of RPKM >=1000, but was not including those with no RPKM in the total count.
  4. viewSingleTCW: the rarely used "Filter Pairs" had quit working.

Version 1.3.3 (7 July 2015)

  1. Locations: From runSingleTCW, locations can be entered using the "Add Remarks and Locations" button. The input file contains rows of (seqid, location) pairs, where the location information in the format ">scaffold:start-end(strand)", e.g. ">LG_1:100-500(-)". Note, the last version allowed the location to be the seqid from the sequence file; this allows the locations to be added after the database has been created.
  2. Minor bug fixes and changes:
    1. The viewSingleTCW overview now additionally reports RPKM ranges.
    2. The viewSingleTCW Blast option did not work for protein databases. Also, the display of the 'long' form of output had a messed-up indentation. Both have been fixed.
    3. In the calculation of the 'Best Annotation': any hit with description "unknown" is now omitted (this is used by Genbank nr). Some other minor changed were added to adapt to changes in UniProt.
    4. The annoDB hits are ranked according to their e-value -- which had been broken and is now fixed. Also, the "View Sequence" options of seeing "Best annoDB, eval & anno" was split into "Best eval & anno" and "Best annodB".
    5. In the Basic searches, using "_" in a substring was ignored by MySQL because it is a special character in MySQL (ignores any single character). This has been fixed.
    6. runMultiTCW has more checks on the input.

Release (16 Jun 2015)

Location information. This release add the ability to load location information, which is useful when the input is predicted genes (with introns removed).

  1. The input sequence fasta files can have location information in the format
    ">scaffold:start-end(strand)", e.g. ">LG_1:100-500(-)".
  2. The location information will be enter into four new columns that will be displayed in the sequence table.

Version v1.3.2 (24 May 2015)

  1. Usearch/ublast can be used for searches; it works best for protein to protein.
  2. The search program can be selected on the runSingleTCW interface; e.g. to run blast on SwissProt where we want hits in the gray zone, but run diamond on Trembl for speed but do not get the hits in the gray zone.
  3. Existing annotation can be removed from the runSingleTCW window.
  4. TCW works for transcripts or protein sequences, but the terminology was basically for transcripts; rewording now uses the terminology "sequences" as much as possible.
  5. viewSingleTCW has some small interface changes for clarity.

Release (3 May 2015)

  1. TCW can extract annotation from a gzipped fasta file. Note: Diamond can search against a gzipped file but Blast cannot.
  2. newUPver.pl: added option to download the full SwissProt but not the full Trembl.
  3. newGOver.pl: added error messages and detection if the GO URL no longer exists.
  4. runSingleTCW:
    • added column for fasta file (there was only a column for the annoDB),
    • fixed a problem on MACs where it did not always detect when an annoDB had already been loaded,
    • fixed a bug where the "Used Protein" query did not work right; the annotation needs to be redone for this to work on existing TCW databases (you can just reload all blast files if they still exist).
  5. viewSingleTCW: The header information on showing sequence to protein alignments is more informative.

Version 1.3.1 (27 Feb 2015)

  1. TCW provides the option to use Diamond for searching protein databases. Diamond executes blastp and blastx-like searchers, producing the same output with very similar e-values. It is awesomely fast!! Though note, it misses some low similarity hits that Blast gets. See using diamond for details on using this program in TCW and performance.
  2. TCW use to read an old NCBI refseq format, but it is not compatiable with NCBI nr format, so TCW has been upgraded to read NCBI nr format. NOTE: GO annotation is only provided for UniProt hits as it reads the necessary information from the UniProt .dat files.
  3. A few small schema (i.e. database) additions for the following:
    1. A new column to provide the number of NCBI hits for a sequence/contig (it already has #Swiss, #Trembl, #NT in viewSingleTCW, Select Columns).
    2. The viewSingleTCW Overview is updated to show better statistics for the annoDB hits.
    If you have an existing TCW databases, view it with viewSingleTCW and it will be automatically updated.
  4. ./execAnno <project name> -a removes the current annotation only. This is useful if you want to try different parameters. That is, if the database contains the hits for an annoDB, runSingleTCW will not let you edit parameters to re-run diamond or blast; this will remove the annotation so you try different parameters easily.
  5. The execAnno/runSingleTCW provides clearer trace output, and multiple other little changes for clarity. A few tiny bug fixes.

Release (28 Dec 2014)

On viewSingleTCW, in the "Select Columns and "Filter Query" panels, it now shows the library Title next to the column name and the library names that were used in a DE calculation. For the Library Title, if it does not have one, you can add it with the runSingleTCW. For the library names to show for the DE column, you need to rerun the DE calculations using runDE.

Release (27 Nov 2014)

  1. The signed applet displayed SyMAP instead of TCW
  2. All the exec scripts (e.g. execAnno) had the wrong path to the jar file

Release (28 Aug 2014)

There are no major changes of functionality for this release, however there are some substantial alterations.

  1. The applets have been signed with a proper certificate to minimize security popups and blocks.
  2. Multi-host browsing capability has been removed as being too prone to problems. Now, HOSTS.cfg can only contain one host/username/password set, and this will be used for all operations.
  3. Mac OSX binaries have been supplied for all auxiliary programs, so all functions should now work on (64-bit) OSX without additional install.
  4. Jar files have been renamed stcw.jar and mtcw.jar
  5. For the DE feature, the path to library libjri.so has been properly included to remove the need to copy this library to a system location
  6. For connecting to the database, TCW will try several variations of 'localhost', including IP address and domain name, to reduce the need to add additional MySQL user entries
Known bugs:
  1. If you pairs computed, sometimes the pairs table shows when you select "Show all sequences"; just go to the "Filter Query" and select search and you will get the all sequence table.
  2. Occasionally when I add an annoDB to an existing project, it does not show up in the interface but it does exist internally, i.e. if you "Annotate", it is included in the annotation.

TCW Version 1.2

Release: 18 December 2013

Major changes

  1. DE values
    1. The DE values only provided signficance but not direction, e.g. if Lib1 compared to Lib2 have a low p-value, is Lib1>Lib2 or is Lib1<Lib2. Hence, if Lib1 is less than Lib2, the p-value will now be negative.
    2. You can just rerun runDE on your project(s) to update the p-values; you must also rerun GOseq. The Overview will be regenerated once you view the project again.
    3. The sort on the DE column ignores the sign so that the most signficant values will sort to the top or the bottom.
    4. viewSingleTCW: the DE Pvalues query allows you to select for each individual library pair up, down or either.
    5. viewMultiTCW: has a new query that allows you to view clusters that have similar DE values; i.e. view the clusters that have {at least one, all} members from N species that have a significant {up, down} DE values for one or more selected pairs.
  2. runSingleTCW:
    1. The Add Remarks is more robust and now allows appending remarks.
  3. viewSingleTCW:
    1. A Blast option allows the user to blast a sequence against those in the database; for applets, the user must have blast on their machine.
    2. An Export type has been added to include the level N GOs on output for the displayed sequences, where N (or a range) will be a parameter on the Export menu.

Smaller changes and bug fixes

  1. viewSingleTCW: If a single sequence or DB hit is selected in the Basic search, the "View Selected Sequence" will display it in the table instead of going directly to the Sequence Detail page, as the DE values are only available from the table.
  2. Fix bug which caused occasional truncation of annotation loading

TCW Version 1.1

Release: 16 July 2013

Major changes

  1. runSingleTCW:
    1. When defining the count files for "Generate File", a directory can be specified and all valid files automatically entered.
    2. Selecting the 'Best Anno' has been further improved when using UniProt, as whether it is a SwissProt (versus TrEMBL) is taken into account.
  2. runDE:
    1. Two ways have been added to add multiple DE columns: (1) All Pairs for Group 1, where every library selected in group 1 will be compared with all others. (2) Get Pairs from File, where the rows in the file list Group 1, Group 2 and the column name.
  3. viewSingleTCW:
    1. A trimmed set of "most interesting" GOs is computed based on GOseq p-values, if GOseq was run.
    2. The BasicGO search output has a tree view mode to see the GO hierarchy.
    3. Contig overview: the lowest level GOs are shown on the main overview page. Using a pull-down, the tree of GOs for the contig can be displayed, or a hit can be selected and only the GOs for that hit are displayed.
  4. runMultiTCW:
    1. The Pearson Correlation Coeffient may be run on all pairs of a cluster.
    2. Improved algorithm for assigning the best description to each cluster.
  5. viewMultiTCW:
    1. The percentage of PCC>=0.8 is shown for each cluster.
    2. The percentage of RPKM>1000 is shown for each cluster.

Smaller changes

  1. runSingleTCW:
    1. Bug fix: DE values could not be added with runDE to assembled contigs, which has been fixed in the assembler.
    2. Though incremental annotation was/is supported, it was cleaned up to ensure that no extract steps were performed.
  2. viewSingleTCW:
    1. Replicas can be viewed from the Sequence detail page.
    2. The display for floating points can be changed on "Select Columns", which applies to all tables that contain any floating point.
    3. On Basic DB hit page, for read libraries, the count was being shown; this has been changed to RPKM.
    4. Multiple projects can be viewed from the same viewSingleTCW startup window without problems.
    5. The #5' and #3' were wrong in contig overview. Removed columns #loners (for assembled contigs) and #Shared hits (for pairs), as these columns no longer have values.
    6. Bug fix: GO query produced an error if GO ID was selected but the go id string did not have numbers.
    7. Bug fix: error viewing sequences for GO
    8. Bug fix: proteins could not be aligned if the project was created with peptides sequences.
  3. viewMultiTCW applet:
    1. The MUSCLE button does not show on the applet, as it will not run from the applet.
    2. The Filter query view looked odd due to missing +/- icons, which has been fixed.

TCW Version 1.0

Release: 15 April 2013

There are major changes from PAVE to TCW, where the biggest are:

  1. New runDE: computes differential expression using published methods for R.
  2. New runMultiTCW: builds a comparison database from multipe single TCW databases.
  3. New viewMultiTCW: view the comparison database.
  4. The manager and viewer for a single species database (as processed by PAVE), are now called runSingleTCW and viewSingleTCW and the database is referred to as the sTCW database.
    1. When using UniProt for annotation, the GO, Pfam, EC and KEGG identifiers are extracted and added to the sTCW database. The GO database is used to add the GO level information. The viewSingleTCW has a "Basic GO Query".
    2. The runSingleTCW will take protein sequences and quantitative counts as input (hence, the sequences in TCW may be assembled consensus sequences, gene models or proteins, so the generic term 'sequence' is used for all these cases). All the other TCW programs can also use protein sequences.

There are also many small feature enhancements and some bug fixes. Here are some of them:

  1. The "1st best hit" has been changed to the "Best Eval" and the "Best Hit" has been changed to the "Best Anno", where the second uses the hit that does not have phrases such as "uncharacterized protein" in its description.
  2. Either the new Blast+ or the legacy Blast can be used.
  3. Remove all usages of InnoDB.
  4. A new column of fold change has been added to viewSingleTCW.
  5. Tables can be 'copied' to the clipboard.

Goto top

Email Comments To: tcw@agcol.arizona.edu