Project: tra #Seqs: 211 #Hits: 12,296 #GOs: 6,144 TPM Seq-DE GO-Enrich Pairs
13-Jul-22 Build Database sequences loaded from external source
13-Jul-22 Last Annotation with sTCW v4.0.3
INPUT
Counts:
SEQID ID SIZE TITLE #REPS
tra Stem 1,865,201 Tissue Stem 5
tra Root 955,097 Tissue Root 5
tra Oleaf 474,551 Tissue Old Leaf 5
Sequences:
SEQID SIZE TITLE AVG-len MED-len
tra 211 Demo of assembled Illumina reads 1,079 861
ANNOTATION
Hit statistics:
Sequences with hits 210 (99.5%) Bases covered by hit 180,339 (79.2%)
Unique hits 12,296 Total bases 227,683
Total sequence hits 17,763
Annotation databases (annoDBs): 7 (see Legend below)
ANNODB ONLY BITS ANNO UNIQUE TOTAL AVG Rank HAS (%Seqs) AVG COVER COVER
%SIM =1 HIT %SIM >=50 >=90
SP-plants 0 0 0 1,344 2,959 53.8 | 193 (91.5%) 70.0 61.7% 3.1%
SP-invertebrates 0 0 0 624 1,270 42.5 | 126 (59.7%) 47.5 30.2% 0.8%
SP-fungi 0 0 0 896 1,517 43.0 | 122 (57.8%) 46.4 24.6% 0.8%
SP-bacteria 0 0 0 790 1,126 44.1 | 63 (29.9%) 45.8 25.4% 0%
SP-full_BFIP 0 0 0 744 1,696 45.7 | 125 (59.2%) 49.1 32.0% 0.8%
TR-plants 7 207 207 4,626 5,241 76.0 | 210 (99.5%) 82.3 68.6% 11.0%
TR-invertebrates 0 3 3 3,272 3,954 48.9 | 186 (88.2%) 53.8 37.6% 2.2%
Top 15 species from total: 1,594
SPECIES (25 char) BITS ANNO TOTAL SPECIES BITS ANNO TOTAL
Musa acuminata 97 2 490 Dendrobium catenatum 2 2 70
Musa balbisiana 52 71 312 Macleaya cordata 2 2 38
Elaeis guineensis 12 45 347 Curcuma alismatifolia 2 2 2
Ensete ventricosum 11 14 206 Vitis vinifera 1 3 86
Zingiber officinale 7 7 27 Apostasia shenzhenica 1 3 59
Ananas comosus 3 17 456 Ricinus communis 1 2 41
Meloidogyne enterolobii 3 2 10 Pinus tabuliformis 1 2 4
Anthurium amnicola 2 4 96 Other 13 32 15,519
Gene ontology statistics:
Unique GOs 6,144 Unique hits with GOs 10,632 (86.5%)
Sequences with GOs 208 (98.6%) Seq best hit has GOs 107 (50.7%)
Has goslim_plant 94
biological_process 4,442 (72.3%) is_a 9,866
molecular_function 979 (15.9%) part_of 1,057
cellular_component 723 (11.8%)
EXPRESSION
TPM: (% of 211)
<2.0 2-5 5-10 10-50 50-100 100-1k 1k-5k >=5k
Stem 3 (1%) 2 (1%) 1(<1%) 6 (3%) 7 (3%) 45(21%) 100(47%) 47(22%)
Root 1(<1%) 0 (0%) 1(<1%) 1(<1%) 0 (0%) 60(28%) 101(48%) 47(22%)
Oleaf 6 (3%) 1(<1%) 1(<1%) 11 (5%) 6 (3%) 40(19%) 96(45%) 50(24%)
Differential expression: (% of 211)
<1E-5 <1E-4 <0.001 <0.01 <0.05
StRo 6 (3%) 17 (8%) 32(15%) 65(31%) 97(46%)
StOl 0 (0%) 2 (1%) 12 (6%) 65(31%) 101(48%)
RoOl 1(<1%) 16 (8%) 37(18%) 74(35%) 110(52%)
Gene ontology enrichment: (% of 6,144)
<1E-5 <1E-4 <0.001 <0.01 <0.05
StRo 0 (0%) 1(<1%) 6(<1%) 59 (1%) 306 (5%)
StOl 0 (0%) 0 (0%) 0 (0%) 1(<1%) 43 (1%)
RoOl 0 (0%) 1(<1%) 2(<1%) 26(<1%) 183 (3%)
SEQUENCES
Sequence lengths:
<=100 101-500 501-1000 1001-2000 2001-3000 3001-4000 4001-5000 >5000
0(0%) 37(18%) 84(40%) 72(34%) 8(4%) 8(4%) 2(1%) 0(0%)
Quality:
Sequences with #n>0: 3 ( 1.4%)
Sequences with #n>10: 1 ( 0.5%)
ORF lengths:
<=100 101-500 501-1000 1001-2000 2001-3000 3001-4000 4001-5000 >5000
0(0%) 62(29%) 92(44%) 47(22%) 4(2%) 6(3%) 0(0%) 0(0%)
ORF stats: Average length 862
Has Hit 210 (99.5%) Both Ends 69 (32.7%) Multi-frame 6 (2.8%)
Is Longest ORF 192 (91.0%) ORF>=300 190 (90.0%) Stops in Hit 6 (2.8%)
Markov Best Score 210 (99.5%) ORF=Hit 103 (48.8%) >=9 Ns in ORF 1 (<1%)
All of the above 191 (90.5%) with Ends 32 (15.2%)
GC content: 48.65%
Pos1 18.3% 5UTR CDS 3UTR 5UTR CDS 3UTR
Pos2 13.7% %GC 43.76 48.78 36.25 Length 21k 182k 25k
Pos3 16.6% CpG-O/E 0.89 0.71 0.63 AvgLen 100.3 861.7 117.1
Similar pairs: 50
Nucleotide 16
Translated nucleotide 50
LOCATIONS
Sequences with location: 12 unique locations: 12
Sequences on positive strand: 7 negative strand: 5
Sequences per group:
1 2 3-4 5-7 8-10 11-20 21-30 >30
0 0 3 0 0 0 0 0
-------------------------------------------------------------------
PROCESSING INFORMATION
AnnoDB files:
Type Taxo FILE DB DATE ADD DATE EXECUTE
sp plants uniprot_sprot_plants.fasta 21-Dec-21 13-Jul-22 diamond --masking 0
sp invertebrates uniprot_sprot_invertebrates.fasta 21-Dec-21 13-Jul-22 diamond --masking 0
sp fungi uniprot_sprot_fungi.fasta 21-Dec-21 13-Jul-22 diamond --masking 0
sp bacteria uniprot_sprot_bacteria.fasta 21-Dec-21 13-Jul-22 diamond --masking 0
sp full_BFIP uniprot_sprot_xBFxIxPxxx.fasta 21-Dec-21 13-Jul-22 diamond --masking 0
tr plants uniprot_trembl_plants.fasta 21-Dec-21 13-Jul-22 diamond --masking 0
tr invertebrates uniprot_trembl_invertebrates.fasta 21-Dec-21 13-Jul-22 diamond --masking 0
Prune: none
Gene ontology: go-basic.obo-Nov2021 GOdb: go_demo [GOs added with sTCW v4.0.3]
GO slim: goslim_plant
ORF finder:
Use ATG only for start site
Rule 1: Use Good hit: E-value <=1E-10 or Sim >= 20%
Rule 2: Use longest ORF if Log Ratio > 0.5
Rule 3: Use best Markov score if Log Ratio > 0.4
Train using best hits (204 seqs, 174.5k bases)
Differential expression computation:
Column Method Conditions
StRo edgeRglm.R CPM>1>=2 Stem : Root
StOl edgeRglm.R CPM>1>=2 Stem : Oleaf
RoOl edgeRglm.R CPM>1>=2 Root : Oleaf
GO enrichment computation:
Column Method Cutoff
StRo goSeqNoFDR.R 5.0e-02
StOl goSeqNoFDR.R 5.0e-02
RoOl goSeqNoFDR.R 5.0e-02
-------------------------------------------------------------------
LEGEND:
annoDBs:
ANNODB is DBTYPE-TAXO, which is the DBtype and taxonomy
ONLY #Seqs that hit the annoDB and no others
BITS #Seqs with the overall best bit-score from the annoDB
ANNO #Seqs with the overall best annotation from the annoDB
UNIQUE #Unique hits to the annoDB
TOTAL #Total seq-hit pairs for the annoDB
AVG %SIM Average percent similarity of the total seq-hit pairs
HIT-SEQ Percent of #Seqs that have at least one hit from the annoDB
BEST HIT The following columns refer to the best hit (Rank=1):
AVG %SIM Average percent similarity of the best hit seq-hit pairs
Cover>=N Percent of HIT-SEQ where the best hit has similarity>=N% and hit coverage>=N%
#Seqs is listed at top of overview
Best annotation:
Descriptions may not contain words such as 'uncharacterized protein'
|