Explain mTCW overview

  • The top line may have any of the labels PCC Stats Multi KaKs; these indicate that the database has the corresponding information. The GOs: label indicates that GOs are in the database.
  • col:x notation: column x should be selected for viewing in the respective Cluster, Pair, Sequence table.
  • Stats: Avg(col:x) notation: use the Table... option Show Column Stats and take the Avg column for the resulting table for col:x. Same for Sum and StdDev.

DATASETS

This first section describes the data imported from the sTCWdbs.

CLUSTER SETS

The database will contain one or more cluster sets. The columns are from the Cluster table.
LabelDescriptionCompute
Statistics
  PrefixThe cluster is referred to by this prefix in the various Filters and Columns.---
  MethodThe method used to compute the cluster.See PROCESSING at end of Overview
  conLenAverage consensus length of the cluster.Stats: Avg(col:conLen)
  sdLenAverage standard deviation of the sequence lengths (AA Len) in each cluster.Stats: Avg(col:sdLen)
  Score1Average of Score1. See PROCESSING for the MSA score1 method. Default: Sum-of-pairs Stats: Avg(col:Score1)
    SDStandard deviation of Score1 Stats: StdDev(col:Score1)
  Score2Average of Score2. See PROCESSING for the MSA score2 method. Default: Wentropy Stats: Avg(col:Score2)
    SDStandard deviation of Score2 Stats: StdDev(col:Score2)
Sizes
  [Range] Range of cluster sizes per cluster set Calculate by sorting on the col:Count

PAIRS

Similar pairs were identified from the comparing the sequences using a search program (BLAST or DIAMOND), where AA is from the amino acid search and NT is from the nucleotide search (NT may not exist). Use Pairs Filters to produce the pairs table.

The Overview shows the overall percentage. In the table below, any Compute cell with superscript1 computes Average of percentages, which is a usually close, but not the same.
Any Compute cell with superscript2 can only be calculated by viewing the Pairwise... of each pair and manually computing.
LabelDescriptionCompute
Hits: (from hit file); for the NT statistics, replace AA with NT.
  Diff The number of hits from different datasets. Filter: Hits: AA pairs;
Datasets: Different sets
  Same The number of hits from the same dataset. Filter: Hits: AA pairs;
Datasets: Same sets
  Similarity Average percent similarity. Filter: Hits: AA pairs; Stats: Avg(col:%AAsim)1
  Coverage Average percent coverage. Filter: Hits: AA pairs; Stats: (Avg(col:%AAcov1) + Avg(col:%AAcov2))/2)1
Aligned: Filter: Statistics: Has Stats (aligned using dynamic programming)
  CDS: Number of aligned CDS bases including gaps but not overhangs. Stats: Sum(col:Align)
  5UTR: Number of aligned 5'UTR bases including gaps but not overhangs. View values; see2
  3UTR: Number of aligned 3'UTR bases including gaps but not overhangs. View values; see2
Codon column
  Codons Number of aligned codons excluding gaps. Stats: Sum(col:Calign)
  Exact Percent codons that are exactly the same. Stats: Avg(col:%Cexact)1
  Synonymous Percent codons that are synonymous (different codon, same amino acid). Stats: Avg(col:%Csyn)1
    Fourfold Percent codons that are fourfold (4d) (synonymous codons where the ith position allows any of the 4 bases). Stats: Avg(col:%C4d)1
    Twofold Percent codons that are twofold (2d) (synonymous codons where the ith position allows any of the 2 bases). Stats: Avg(col:%C2d)1
  Nonsynonymous Percent codons that are nonsynonynous (different amino acid). Stats: Avg(col:%nonSyn)1
Amino acid column
  Exact Percent amino acid that are the same. Stats: Avg(col:%Aexact)1
  Substitution>0 Percent amino acids that are substitutions with BLOSUM62>0. Stats: Avg(col:%Apos)1
  Substitution<=0 Percent amino acids that are substitution with BLOSUM62<=0. Stats: Avg(col:%Aneg)1
Nucleotides columns
  CDS Diff Percent CDS bases that are different, i.e. ((Gap+SNP)/Align)%. Stats: ((sum(col:gap) + sum(col:SNP)) / sum(col:align)) x 100.0
    Gaps Percent CDS bases that are Gaps, i.e. (Gaps/Align)%. Stats: (sum(col:gap) / sum(col:align)) x 100.0
    SNPs Percent CDS bases that are SNPs, i.e (SNP/Align)%. Stats: (sum(col:SNP) / sum(col:align)) x 100.0
  5UTR Diff Percent 5'UTR bases that are different. Stats: Avg(col:%5diff)1
  3UTR Diff Percent 3'UTR bases that are different. Stats: Avg(col:%3diff)1
Columns: Pos1 Pos2 Pos3 Total
  Transition (ts) Percent SNPs that are transitions in each of the 3 codon positions. View values; see2
  Transversion (tv) Percent SNPs that are transversion in each of the 3 codon positions. View values; see2
  ts/tv The total number of transitions divided the total number of transversions View values; see2
Columns: GC CpG-Nt CpG-Cd
CpG-Nt (nucleotide) and CpG-Cd (codon, CpG does not cross codon boundaries).
  Both Percent CDS bases where both (union) sequences have a GC base. Same for the CpG sites. View values; see2
  Either Percent CDS bases where either or both (intersection) sequences have a GC base. Same for the CpG sites. View values; see2
  Jaccard The total number of 'both' divided by the total number of 'either'. View values; see2
KaKs method:3
KaKs It is rare for Ka/Ks to be exactly 1, so the following fudge factors are used:
 Rule Fudge factor Strength
 KaKs>1 >= 1.006 positive (driving change)
 KaKs=1 >= 0.995 and < 1.006 neutral
 KaKs<1 < 0.995 purifying (against change)
Set Pair Filters according to fudge factors on the left.

For NA, Filters: uncheck KaKs and check KaKs=NA
QuartilesApplies to the KaKs values. It uses the method of splitting the list in half; Q1 is the median of the lower half and Q3 is the median of the upper half. Q2: Stats: Median(col:KaKs)
Average
  KaNonsynonymous substitution rate.Stats: Avg(col:Ka)
  KsSynonymous substitution rate.Stats: Avg(col:Ks)
  P-valueKaKs p-value.Stats: Avg(col:p-value)
P-value Counts of p-value in 4 ranges. Sort on col:p-value. Round-off error occurs (see Display Decimal Help).

SEQUENCES

The columns are in the Sequence table; use Sequence Filters to select a dataset to view its corresponding results.
LabelDescriptionCompute
Average Lengths The ORFs were computed for the sTCWdbs and imported along with their translated sequence. Stats: Avg(5UTR Len), Avg(3UTR Len), Avg(CDS Len)
%GC The average percent of GC for 5'UTR, CDS and 3'UTR. --
CpG O/E The CpG observed/expected for the 5'UTR, CDS and 3'UTR [(#CpG/(#G*#C))*Len]. --
Counts The total raw counts from each dataset. These can be verified from the singleTCWs.
Differential Expression The total DE from each dataset. These can be verified from the singleTCWs.

1Percents

  • Overview percents are computed by summing the numerator and denominator then dividing.
  • Stats: Avg(col:X): The Table... option Show Column Stats is taking the average of the percentages.
  • For example, in the mTCW_ex demo (created 20-Mar-22),
    • Overall %Exact: sum(exact codons)/sum(total codons) = 57.8%.
    • Average of %Exact: sum(%Exact)/number pairs = 59.39%.
  • The actual counts are only available in the Pairwise... view, as described below2.
  • However, the Aligned and KsKs values can be computed for any Pair Table by selecting the Table... option Show Table Stats.

2Counts associated with percents

The only way to view most counts is through the Pair Table, as follows:
  • Select Pairwise..., select 5UTR, CDS, 3UTR.
  • In the alignment panel, select the button specified in the Align column below.
  • This will pop-up a window where you select the Option as specified below.
  • The resulting pop-up window will have columns of counts at the top, followed by the text alignment with indications of the requested Options information.
  • Relevant numbers;
    • Calign: The codon length with overhang and gap codons removed.
    • CROP: The nucleotide length with the overhang removed.
    • SNPs: The number of nucleotide differences ignoring gaps.
VariablesAlignOptionCount
Codon Exact, Non/Synonymous Align CDS... Match These counts are at the top. Divide by Calign.
Fourfold (4d), twofold (2d) Align CDS... DegenerateThese counts are at the top. Divide by Calign.
AA Exact, Substitution Align CDS... Amino acidThese counts are at the top. Divide by Calign.
ts, tv, ts/tv Align CDS... ts/tvThese counts are at the top. Divide by SNPs.
GC, CpG-Nt, CpG-Cd Align CDS...CpG These counts are at the top. Divide by CROP.
The CpG percents are 2x since they involve two nucleotides.
5'UTR Diff Align 5UTR... --- NT-Diff is at the top. Divide by CROP.
3'UTR Diff Align 3UTR... --- NT-Diff is at the top. Divide by CROP.

3 The KaKs_calculator

The KaKs_calculator (Zhang et al. 2006) will typically be used to compute the KaKs values, where the method used is shown on the "KaKs Method" line.