Project: tra #Seqs: 211 #Hits: 12,296 #GOs: 6,144 TPM Seq-DE GO-Enrich Pairs 13-Jul-22 Build Database sequences loaded from external source 13-Jul-22 Last Annotation with sTCW v4.0.3 INPUT Counts: SEQID ID SIZE TITLE #REPS tra Stem 1,865,201 Tissue Stem 5 tra Root 955,097 Tissue Root 5 tra Oleaf 474,551 Tissue Old Leaf 5 Sequences: SEQID SIZE TITLE AVG-len MED-len tra 211 Demo of assembled Illumina reads 1,079 861 ANNOTATION Hit statistics: Sequences with hits 210 (99.5%) Bases covered by hit 180,339 (79.2%) Unique hits 12,296 Total bases 227,683 Total sequence hits 17,763 Annotation databases (annoDBs): 7 (see Legend below) ANNODB ONLY BITS ANNO UNIQUE TOTAL AVG Rank HAS (%Seqs) AVG COVER COVER %SIM =1 HIT %SIM >=50 >=90 SP-plants 0 0 0 1,344 2,959 53.8 | 193 (91.5%) 70.0 61.7% 3.1% SP-invertebrates 0 0 0 624 1,270 42.5 | 126 (59.7%) 47.5 30.2% 0.8% SP-fungi 0 0 0 896 1,517 43.0 | 122 (57.8%) 46.4 24.6% 0.8% SP-bacteria 0 0 0 790 1,126 44.1 | 63 (29.9%) 45.8 25.4% 0% SP-full_BFIP 0 0 0 744 1,696 45.7 | 125 (59.2%) 49.1 32.0% 0.8% TR-plants 7 207 207 4,626 5,241 76.0 | 210 (99.5%) 82.3 68.6% 11.0% TR-invertebrates 0 3 3 3,272 3,954 48.9 | 186 (88.2%) 53.8 37.6% 2.2% Top 15 species from total: 1,594 SPECIES (25 char) BITS ANNO TOTAL SPECIES BITS ANNO TOTAL Musa acuminata 97 2 490 Dendrobium catenatum 2 2 70 Musa balbisiana 52 71 312 Macleaya cordata 2 2 38 Elaeis guineensis 12 45 347 Curcuma alismatifolia 2 2 2 Ensete ventricosum 11 14 206 Vitis vinifera 1 3 86 Zingiber officinale 7 7 27 Apostasia shenzhenica 1 3 59 Ananas comosus 3 17 456 Ricinus communis 1 2 41 Meloidogyne enterolobii 3 2 10 Pinus tabuliformis 1 2 4 Anthurium amnicola 2 4 96 Other 13 32 15,519 Gene ontology statistics: Unique GOs 6,144 Unique hits with GOs 10,632 (86.5%) Sequences with GOs 208 (98.6%) Seq best hit has GOs 107 (50.7%) Has goslim_plant 94 biological_process 4,442 (72.3%) is_a 9,866 molecular_function 979 (15.9%) part_of 1,057 cellular_component 723 (11.8%) EXPRESSION TPM: (% of 211) <2.0 2-5 5-10 10-50 50-100 100-1k 1k-5k >=5k Stem 3 (1%) 2 (1%) 1(<1%) 6 (3%) 7 (3%) 45(21%) 100(47%) 47(22%) Root 1(<1%) 0 (0%) 1(<1%) 1(<1%) 0 (0%) 60(28%) 101(48%) 47(22%) Oleaf 6 (3%) 1(<1%) 1(<1%) 11 (5%) 6 (3%) 40(19%) 96(45%) 50(24%) Differential expression: (% of 211) <1E-5 <1E-4 <0.001 <0.01 <0.05 StRo 6 (3%) 17 (8%) 32(15%) 65(31%) 97(46%) StOl 0 (0%) 2 (1%) 12 (6%) 65(31%) 101(48%) RoOl 1(<1%) 16 (8%) 37(18%) 74(35%) 110(52%) Gene ontology enrichment: (% of 6,144) <1E-5 <1E-4 <0.001 <0.01 <0.05 StRo 0 (0%) 1(<1%) 6(<1%) 59 (1%) 306 (5%) StOl 0 (0%) 0 (0%) 0 (0%) 1(<1%) 43 (1%) RoOl 0 (0%) 1(<1%) 2(<1%) 26(<1%) 183 (3%) SEQUENCES Sequence lengths: <=100 101-500 501-1000 1001-2000 2001-3000 3001-4000 4001-5000 >5000 0(0%) 37(18%) 84(40%) 72(34%) 8(4%) 8(4%) 2(1%) 0(0%) Quality: Sequences with #n>0: 3 ( 1.4%) Sequences with #n>10: 1 ( 0.5%) ORF lengths: <=100 101-500 501-1000 1001-2000 2001-3000 3001-4000 4001-5000 >5000 0(0%) 62(29%) 92(44%) 47(22%) 4(2%) 6(3%) 0(0%) 0(0%) ORF stats: Average length 862 Has Hit 210 (99.5%) Both Ends 69 (32.7%) Multi-frame 6 (2.8%) Is Longest ORF 192 (91.0%) ORF>=300 190 (90.0%) Stops in Hit 6 (2.8%) Markov Best Score 210 (99.5%) ORF=Hit 103 (48.8%) >=9 Ns in ORF 1 (<1%) All of the above 191 (90.5%) with Ends 32 (15.2%) GC content: 48.65% Pos1 18.3% 5UTR CDS 3UTR 5UTR CDS 3UTR Pos2 13.7% %GC 43.76 48.78 36.25 Length 21k 182k 25k Pos3 16.6% CpG-O/E 0.89 0.71 0.63 AvgLen 100.3 861.7 117.1 Similar pairs: 50 Nucleotide 16 Translated nucleotide 50 LOCATIONS Sequences with location: 12 unique locations: 12 Sequences on positive strand: 7 negative strand: 5 Sequences per group: 1 2 3-4 5-7 8-10 11-20 21-30 >30 0 0 3 0 0 0 0 0 ------------------------------------------------------------------- PROCESSING INFORMATION AnnoDB files: Type Taxo FILE DB DATE ADD DATE EXECUTE sp plants uniprot_sprot_plants.fasta 21-Dec-21 13-Jul-22 diamond --masking 0 sp invertebrates uniprot_sprot_invertebrates.fasta 21-Dec-21 13-Jul-22 diamond --masking 0 sp fungi uniprot_sprot_fungi.fasta 21-Dec-21 13-Jul-22 diamond --masking 0 sp bacteria uniprot_sprot_bacteria.fasta 21-Dec-21 13-Jul-22 diamond --masking 0 sp full_BFIP uniprot_sprot_xBFxIxPxxx.fasta 21-Dec-21 13-Jul-22 diamond --masking 0 tr plants uniprot_trembl_plants.fasta 21-Dec-21 13-Jul-22 diamond --masking 0 tr invertebrates uniprot_trembl_invertebrates.fasta 21-Dec-21 13-Jul-22 diamond --masking 0 Prune: none Gene ontology: go-basic.obo-Nov2021 GOdb: go_demo [GOs added with sTCW v4.0.3] GO slim: goslim_plant ORF finder: Use ATG only for start site Rule 1: Use Good hit: E-value <=1E-10 or Sim >= 20% Rule 2: Use longest ORF if Log Ratio > 0.5 Rule 3: Use best Markov score if Log Ratio > 0.4 Train using best hits (204 seqs, 174.5k bases) Differential expression computation: Column Method Conditions StRo edgeRglm.R CPM>1>=2 Stem : Root StOl edgeRglm.R CPM>1>=2 Stem : Oleaf RoOl edgeRglm.R CPM>1>=2 Root : Oleaf GO enrichment computation: Column Method Cutoff StRo goSeqNoFDR.R 5.0e-02 StOl goSeqNoFDR.R 5.0e-02 RoOl goSeqNoFDR.R 5.0e-02 ------------------------------------------------------------------- LEGEND: annoDBs: ANNODB is DBTYPE-TAXO, which is the DBtype and taxonomy ONLY #Seqs that hit the annoDB and no others BITS #Seqs with the overall best bit-score from the annoDB ANNO #Seqs with the overall best annotation from the annoDB UNIQUE #Unique hits to the annoDB TOTAL #Total seq-hit pairs for the annoDB AVG %SIM Average percent similarity of the total seq-hit pairs HIT-SEQ Percent of #Seqs that have at least one hit from the annoDB BEST HIT The following columns refer to the best hit (Rank=1): AVG %SIM Average percent similarity of the best hit seq-hit pairs Cover>=N Percent of HIT-SEQ where the best hit has similarity>=N% and hit coverage>=N% #Seqs is listed at top of overview Best annotation: Descriptions may not contain words such as 'uncharacterized protein' |