Overview for sTCW_demoTra

Project:  tra   #Seqs: 211   #Hits: 12,296   #GOs: 6,144    TPM  Seq-DE  GO-Enrich   Pairs

13-Jul-22 Build Database  sequences loaded from external source
13-Jul-22 Last Annotation with sTCW v4.0.3

INPUT
   Counts:
      SEQID  ID          SIZE  TITLE            #REPS
      tra    Stem   1,865,201  Tissue Stem          5
      tra    Root     955,097  Tissue Root          5
      tra    Oleaf    474,551  Tissue Old Leaf      5

   Sequences:
      SEQID  SIZE  TITLE                             AVG-len  MED-len
      tra     211  Demo of assembled Illumina reads    1,079      861

ANNOTATION
   Hit statistics:
      Sequences with hits     210  (99.5%)     Bases covered by hit  180,339  (79.2%)
      Unique hits          12,296              Total bases           227,683
      Total sequence hits  17,763

   Annotation databases (annoDBs): 7   (see Legend below)
      ANNODB            ONLY  BITS  ANNO  UNIQUE  TOTAL   AVG  Rank  HAS (%Seqs)   AVG  COVER  COVER
                                                         %SIM   =1   HIT          %SIM   >=50   >=90
      SP-plants            0     0     0   1,344  2,959  53.8    |   193 (91.5%)  70.0  61.7%   3.1%
      SP-invertebrates     0     0     0     624  1,270  42.5    |   126 (59.7%)  47.5  30.2%   0.8%
      SP-fungi             0     0     0     896  1,517  43.0    |   122 (57.8%)  46.4  24.6%   0.8%
      SP-bacteria          0     0     0     790  1,126  44.1    |    63 (29.9%)  45.8  25.4%     0%
      SP-full_BFIP         0     0     0     744  1,696  45.7    |   125 (59.2%)  49.1  32.0%   0.8%
      TR-plants            7   207   207   4,626  5,241  76.0    |   210 (99.5%)  82.3  68.6%  11.0%
      TR-invertebrates     0     3     3   3,272  3,954  48.9    |   186 (88.2%)  53.8  37.6%   2.2%

   Top 15 species from total: 1,594
      SPECIES (25 char)         BITS   ANNO  TOTAL     SPECIES                 BITS   ANNO   TOTAL
      Musa acuminata              97      2    490     Dendrobium catenatum       2      2      70
      Musa balbisiana             52     71    312     Macleaya cordata           2      2      38
      Elaeis guineensis           12     45    347     Curcuma alismatifolia      2      2       2
      Ensete ventricosum          11     14    206     Vitis vinifera             1      3      86
      Zingiber officinale          7      7     27     Apostasia shenzhenica      1      3      59
      Ananas comosus               3     17    456     Ricinus communis           1      2      41
      Meloidogyne enterolobii      3      2     10     Pinus tabuliformis         1      2       4
      Anthurium amnicola           2      4     96     Other                     13     32  15,519

   Gene ontology statistics:
      Unique GOs          6,144              Unique hits with GOs  10,632  (86.5%)
      Sequences with GOs    208  (98.6%)     Seq best hit has GOs     107  (50.7%)
      Has goslim_plant       94

      biological_process  4,442  (72.3%)     is_a                   9,866
      molecular_function    979  (15.9%)     part_of                1,057
      cellular_component    723  (11.8%)

EXPRESSION
   TPM: (% of 211)
               <2.0     2-5    5-10    10-50  50-100   100-1k     1k-5k     >=5k
      Stem   3 (1%)  2 (1%)  1(<1%)   6 (3%)  7 (3%)  45(21%)  100(47%)  47(22%)
      Root   1(<1%)  0 (0%)  1(<1%)   1(<1%)  0 (0%)  60(28%)  101(48%)  47(22%)
      Oleaf  6 (3%)  1(<1%)  1(<1%)  11 (5%)  6 (3%)  40(19%)   96(45%)  50(24%)

   Differential expression:  (% of 211)
             <1E-5    <1E-4   <0.001    <0.01     <0.05
      StRo  6 (3%)  17 (8%)  32(15%)  65(31%)   97(46%)
      StOl  0 (0%)   2 (1%)  12 (6%)  65(31%)  101(48%)
      RoOl  1(<1%)  16 (8%)  37(18%)  74(35%)  110(52%)

   Gene ontology enrichment:  (% of 6,144)
             <1E-5   <1E-4  <0.001    <0.01     <0.05
      StRo  0 (0%)  1(<1%)  6(<1%)  59 (1%)  306 (5%)
      StOl  0 (0%)  0 (0%)  0 (0%)   1(<1%)   43 (1%)
      RoOl  0 (0%)  1(<1%)  2(<1%)  26(<1%)  183 (3%)

SEQUENCES
   Sequence lengths:
      <=100  101-500  501-1000  1001-2000  2001-3000  3001-4000  4001-5000  >5000
      0(0%)  37(18%)   84(40%)    72(34%)      8(4%)      8(4%)      2(1%)  0(0%)

   Quality:
      Sequences with #n>0:   3  ( 1.4%)
      Sequences with #n>10:  1  ( 0.5%)

   ORF lengths:
      <=100  101-500  501-1000  1001-2000  2001-3000  3001-4000  4001-5000  >5000
      0(0%)  62(29%)   92(44%)    47(22%)      4(2%)      6(3%)      0(0%)  0(0%)

   ORF stats:   Average length 862
      Has Hit            210  (99.5%)    Both Ends     69  (32.7%)    Multi-frame    6   (2.8%)
      Is Longest ORF     192  (91.0%)    ORF>=300     190  (90.0%)    Stops in Hit   6   (2.8%)
      Markov Best Score  210  (99.5%)    ORF=Hit      103  (48.8%)    >=9 Ns in ORF  1    (<1%)
      All of the above   191  (90.5%)      with Ends   32  (15.2%)

   GC content: 48.65%
      Pos1  18.3%              5UTR    CDS   3UTR              5UTR     CDS    3UTR
      Pos2  13.7%    %GC      43.76  48.78  36.25    Length     21k    182k     25k
      Pos3  16.6%    CpG-O/E   0.89   0.71   0.63    AvgLen   100.3   861.7   117.1

   Similar pairs: 50
      Nucleotide             16
      Translated nucleotide  50

LOCATIONS
   Sequences with location:          12  unique locations: 12
   Sequences on positive strand:      7  negative strand:  5
   Sequences per group:
      1  2  3-4  5-7  8-10  11-20  21-30  >30
      0  0    3    0     0      0      0    0


-------------------------------------------------------------------
PROCESSING INFORMATION
   AnnoDB files:
      Type  Taxo           FILE                                DB DATE    ADD DATE   EXECUTE
      sp    plants         uniprot_sprot_plants.fasta          21-Dec-21  13-Jul-22  diamond  --masking 0
      sp    invertebrates  uniprot_sprot_invertebrates.fasta   21-Dec-21  13-Jul-22  diamond  --masking 0
      sp    fungi          uniprot_sprot_fungi.fasta           21-Dec-21  13-Jul-22  diamond  --masking 0
      sp    bacteria       uniprot_sprot_bacteria.fasta        21-Dec-21  13-Jul-22  diamond  --masking 0
      sp    full_BFIP      uniprot_sprot_xBFxIxPxxx.fasta      21-Dec-21  13-Jul-22  diamond  --masking 0
      tr    plants         uniprot_trembl_plants.fasta         21-Dec-21  13-Jul-22  diamond  --masking 0
      tr    invertebrates  uniprot_trembl_invertebrates.fasta  21-Dec-21  13-Jul-22  diamond  --masking 0

   Prune: none

   Gene ontology: go-basic.obo-Nov2021  GOdb: go_demo  [GOs added with sTCW v4.0.3]
   GO slim: goslim_plant

   ORF finder:
      Use ATG only for start site
      Rule 1: Use Good hit: E-value <=1E-10 or Sim >= 20%
      Rule 2: Use longest ORF if Log Ratio > 0.5
      Rule 3: Use best Markov score if Log Ratio > 0.4
              Train using best hits (204 seqs, 174.5k bases)

   Differential expression computation:
      Column       Method                         Conditions
      StRo         edgeRglm.R CPM>1>=2            Stem : Root
      StOl         edgeRglm.R CPM>1>=2            Stem : Oleaf
      RoOl         edgeRglm.R CPM>1>=2            Root : Oleaf

   GO enrichment computation:
      Column       Method                         Cutoff
      StRo         goSeqNoFDR.R                   5.0e-02
      StOl         goSeqNoFDR.R                   5.0e-02
      RoOl         goSeqNoFDR.R                   5.0e-02

-------------------------------------------------------------------
LEGEND:
   annoDBs:
      ANNODB    is DBTYPE-TAXO, which is the DBtype and taxonomy
      ONLY      #Seqs that hit the annoDB and no others
      BITS      #Seqs with the overall best bit-score from the annoDB
      ANNO      #Seqs with the overall best annotation from the annoDB
      UNIQUE    #Unique hits to the annoDB
      TOTAL     #Total seq-hit pairs for the annoDB
      AVG %SIM  Average percent similarity of the total seq-hit pairs
      HIT-SEQ   Percent of #Seqs that have at least one hit from the annoDB
      BEST HIT  The following columns refer to the best hit (Rank=1):
         AVG %SIM  Average percent similarity of the best hit seq-hit pairs
         Cover>=N  Percent of HIT-SEQ where the best hit has similarity>=N% and hit coverage>=N%

   #Seqs is listed at top of overview
   Best annotation:
      Descriptions may not contain words such as 'uncharacterized protein'