--------------- Annotate Sequences --------------- 28-Feb-22 06:29:59
Check database sTCW_demoTra
sTCW ID: tra
Database: NT-sTCW
Create: 2022-02-28
User: cari
Path: /Users/cari/Workspace/github/TCW/projects/demoTra
Database has no annotation.
Checking annoDB files
DB#1 diamond SP AA: projects/DBfasta/UniProt_demo/sp_plants/uniprot_sprot_plants.fasta
DB#2 diamond SP AA: projects/DBfasta/UniProt_demo/sp_invertebrates/uniprot_sprot_invertebrates.fasta
DB#3 diamond SP AA: projects/DBfasta/UniProt_demo/sp_fungi/uniprot_sprot_fungi.fasta
DB#4 diamond SP AA: projects/DBfasta/UniProt_demo/sp_bacteria/uniprot_sprot_bacteria.fasta
DB#5 diamond SP AA: projects/DBfasta/UniProt_demo/sp_full/uniprot_sprot_xBFxIxPxxx.fasta
DB#6 diamond TR AA: projects/DBfasta/UniProt_demo/tr_plants/uniprot_trembl_plants.fasta
DB#7 diamond TR AA: projects/DBfasta/UniProt_demo/tr_invertebrates/uniprot_trembl_invertebrates.fasta
Pairs blastn: projects/demoTra/hitResults/tra_seqNT.fa
Pairs tblastx: projects/demoTra/hitResults/tra_seqNT.fa
Checking for existing tab files
Check complete: Run Search 9 Use existing 0
Check GO database
GO database = go_demo
Valid goDB exists 'go_demo'
Add GO_slim_subset goslim_plant
Start annotating sequences 28-Feb-22 06:29:59
211 Sequences loaded
Remove {ECO...} from UniProt descriptions
Annotate sequences with sequence hits from 7 DB file(s)
Creating /Users/cari/Workspace/github/TCW/projects/demoTra/hitResults directory
Create sequence file: projects/demoTra/hitResults/tra_seqNT.fa
Wrote 211 sequence records
DB#1 uniprot_sprot_plants.fasta 1.7Mb 28-Feb-22 06:29:59
Using existing formated files
Ext/mac/diamond/diamond blastx -q projects/demoTra/hitResults/tra_seqNT.fa -d projects/DBfasta/UniProt_demo/sp_plants/uniprot_sprot_plants.fasta.dmnd -o projects/demoTra/hitResults/tra_SPpla.dmnd.tab --masking 0 -p 24 --quiet
Complete diamond 0m:0s
DB#1 hits: tra_SPpla.dmnd.tab
2,959 seq-hit pairs
193 annotated sequences 0m:0s (3Mb)
DB#1 descriptions: uniprot_sprot_plants.fasta
1,344 unique hits descriptions added from 2,775 0m:0s (2Mb)
Complete adding DB#1 0m:1s
DB#2 uniprot_sprot_invertebrates.fasta 1.5Mb 28-Feb-22 06:30:01
Using existing formated files
Ext/mac/diamond/diamond blastx -q projects/demoTra/hitResults/tra_seqNT.fa -d projects/DBfasta/UniProt_demo/sp_invertebrates/uniprot_sprot_invertebrates.fasta.dmnd -o projects/demoTra/hitResults/tra_SPinv.dmnd.tab --masking 0 -p 24 --quiet
Complete diamond 0m:0s
DB#2 hits: tra_SPinv.dmnd.tab
1,271 seq-hit pairs
126 annotated sequences 0m:0s (2Mb)
DB#2 descriptions: uniprot_sprot_invertebrates.fasta
624 unique hits descriptions added from 2,188 0m:0s (2Mb)
Complete adding DB#2 0m:0s
DB#3 uniprot_sprot_fungi.fasta 1.6Mb 28-Feb-22 06:30:02
Using existing formated files
Ext/mac/diamond/diamond blastx -q projects/demoTra/hitResults/tra_seqNT.fa -d projects/DBfasta/UniProt_demo/sp_fungi/uniprot_sprot_fungi.fasta.dmnd -o projects/demoTra/hitResults/tra_SPfun.dmnd.tab --masking 0 -p 24 --quiet
Complete diamond 0m:0s
DB#3 hits: tra_SPfun.dmnd.tab
1,516 seq-hit pairs
121 annotated sequences 0m:0s (2Mb)
DB#3 descriptions: uniprot_sprot_fungi.fasta
895 unique hits descriptions added from 2,349 0m:0s (2Mb)
Complete adding DB#3 0m:1s
DB#4 uniprot_sprot_bacteria.fasta 1.4Mb 28-Feb-22 06:30:03
Using existing formated files
Ext/mac/diamond/diamond blastx -q projects/demoTra/hitResults/tra_seqNT.fa -d projects/DBfasta/UniProt_demo/sp_bacteria/uniprot_sprot_bacteria.fasta.dmnd -o projects/demoTra/hitResults/tra_SPbac.dmnd.tab --masking 0 -p 24 --quiet
Complete diamond 0m:0s
DB#4 hits: tra_SPbac.dmnd.tab
1,126 seq-hit pairs
63 annotated sequences 0m:0s (2Mb)
DB#4 descriptions: uniprot_sprot_bacteria.fasta
790 unique hits descriptions added from 2,176 0m:0s (2Mb)
Complete adding DB#4 0m:1s
DB#5 uniprot_sprot_xBFxIxPxxx.fasta 2.6Mb 28-Feb-22 06:30:04
Using existing formated files
Ext/mac/diamond/diamond blastx -q projects/demoTra/hitResults/tra_seqNT.fa -d projects/DBfasta/UniProt_demo/sp_full/uniprot_sprot_xBFxIxPxxx.fasta.dmnd -o projects/demoTra/hitResults/tra_SPful.dmnd.tab --masking 0 -p 24 --quiet
Complete diamond 0m:0s
DB#5 hits: tra_SPful.dmnd.tab
1,742 seq-hit pairs
144 annotated sequences 0m:0s (2Mb)
DB#5 descriptions: uniprot_sprot_xBFxIxPxxx.fasta
740 unique hits descriptions added from 3,546 0m:0s (2Mb)
Complete adding DB#5 0m:1s
DB#6 uniprot_trembl_plants.fasta 10.3Mb 28-Feb-22 06:30:05
Using existing formated files
Ext/mac/diamond/diamond blastx -q projects/demoTra/hitResults/tra_seqNT.fa -d projects/DBfasta/UniProt_demo/tr_plants/uniprot_trembl_plants.fasta.dmnd -o projects/demoTra/hitResults/tra_TRpla.dmnd.tab --masking 0 -p 24 --quiet
Complete diamond 0m:0s
DB#6 hits: tra_TRpla.dmnd.tab
5,235 seq-hit pairs
210 annotated sequences 0m:0s (3Mb)
DB#6 descriptions: uniprot_trembl_plants.fasta
4,614 unique hits descriptions added from 15,124 0m:3s (3Mb)
Complete adding DB#6 0m:4s
DB#7 uniprot_trembl_invertebrates.fasta 9.9Mb 28-Feb-22 06:30:10
Using existing formated files
Ext/mac/diamond/diamond blastx -q projects/demoTra/hitResults/tra_seqNT.fa -d projects/DBfasta/UniProt_demo/tr_invertebrates/uniprot_trembl_invertebrates.fasta.dmnd -o projects/demoTra/hitResults/tra_TRinv.dmnd.tab --masking 0 -p 24 --quiet
Complete diamond 0m:0s
DB#7 hits: tra_TRinv.dmnd.tab
3,700 seq-hit pairs
181 annotated sequences 0m:0s (3Mb)
DB#7 descriptions: uniprot_trembl_invertebrates.fasta
2,950 unique hits descriptions added from 14,024 0m:2s (3Mb)
Complete adding DB#7 0m:3s
Process all hits for 210 sequences
4 Sequences with hits to multiple frames
4 Sequences with hits to different orientations
Finish filter 0m:3s (3Mb)
Creating species table
Read species per sequence from database
17,549 total seq-hits
1,571 total species
Insert species counts into database
Insert species totals per database
Finish creating species table 0m:0s (4Mb)
Finished 210 annotated 1 unannotated 0m:17s
Annotate with GC and ORF
Load all sequence from database
211 Sequences to process
210 With hits 1 With no hit
210 Good hit (%Sim>=20 || E-value>=1E-10)
61 Great hit (%Sim>=60 && %Hit>=95)
6 Hits with stops (find longest non-stop hit region)
Complete load
Start computation of coding potential
204 hit sequences 7 Ignored
Find longest unique sequences with best hits
0 Non-unique from longest 204 sequences
Train with 204 unique longest sequences (59)
174,483 Bases used for training
Compute Codon frequency and write to projects/demoTra/orfFiles/scoreCodon.txt
Compute Markov loglikelihood and write to projects/demoTra/orfFiles/scoreMarkov.txt
Base Frequencies: a:0.265 c:0.235 t:0.265 g:0.235
Save training results to database
Complete training 0m:3s (4Mb)
Start ORF computation
Writing ORF information to database and files in projects/demoTra/orfFiles
Complete ORF computation 0m:1s (5Mb)
Save all best ORFs to the database
Save 1266 all frame ORFs to the database
Finish saving ORF data 0m:0s (5Mb)
ORF Stats: Average length 862
Has Hit 210 (99.5%) Both Ends 69 (32.7%) Multi-frame 4 (1.9%)
Is Longest ORF 192 (91.0%) ORF>=300 190 (90.0%) Stops in Hit 6 (2.8%)
Markov Best Score 210 (99.5%) ORF=Hit 109 (51.7%) >=9 Ns in ORF 1 (<1%)
All of the above 191 (90.5%) with Ends 32 (15.2%)
Additional ORF info For seqs with hit 210 (99.5%) ORF=Hit with Ends 32 (15.2%)
One End 169 (80.1%) Both Ends 69 (32.9%) ORF>=300 31 (96.9%)
Markov Good Frame 210 (99.5%) Markov Good Frame 209 (99.5%) Markov Good Frame 32 (100.0%)
ORF=Hit 109 (51.7%) Markov Best Score 209 (99.5%) Markov Best Score 32 (100.0%)
ORF~Hit 28 (13.3%) Is Longest ORF 192 (91.4%) Is Longest ORF 31 (96.9%)
ORF>Hit 69 (32.7%) Longest & Markov 191 (91.0%) Longest & Markov 31 (96.9%)
with Ends 19 (9.0%) Not hit frame 0 Sim>=90 14 (43.8%)
Frame: 3(13.3%) 2(19.9%) 1(22.3%) -1(17.1%) -2(15.2%) -3(12.3%)
Both Ends: Has Start and Stop codon
ORF=Hit with ends: ORF coordinates=Hit coordinates with ends
Markov Best Score: Best score from best ORF for each of 6 frames
Markov Good Frame: Score>0 and best score from 6 RFs of selected ORF
GC Content: 48.65%
Exceptions: 0
Wrote 252 ORFs to allGoodORFs.pep.fa and allGoodORFs.scores.txt
Complete annotation with ORF and GC 0m:5s
Finished annotating sequences 0m:22s
Start creating Pairs 28-Feb-22 06:30:22
Running pairs blastn
Format file for blast
/Users/cari/Workspace/github/TCW/Ext/mac/blast/makeblastdb -dbtype nucl -in projects/demoTra/hitResults/tra_seqNT.fa
Complete formatting 0m:0s
/Users/cari/Workspace/github/TCW/Ext/mac/blast/blastn -query projects/demoTra/hitResults/tra_seqNT.fa -db projects/demoTra/hitResults/tra_seqNT.fa -out projects/demoTra/hitResults/tra_self_blastn.tab -outfmt 6 -evalue 1e-05 -max_hsps 1 -max_target_seqs 25 -num_threads 24
Complete blastn 0m:0s
Running pairs tblastx
Using existing formated files
/Users/cari/Workspace/github/TCW/Ext/mac/blast/tblastx -query projects/demoTra/hitResults/tra_seqNT.fa -db projects/demoTra/hitResults/tra_seqNT.fa -out projects/demoTra/hitResults/tra_self_tblastx.tab -outfmt 6 -evalue 1e-05 -max_hsps 1 -max_target_seqs 25 -num_threads 24
Complete tblastx 0m:0s
Find pairs to align
20 Pairs from blastn
395 Pairs from tblastx
Aligning best 50 out of 415 pairs, due to Pairs limit in Options
Finished 50 alignments 0m:2s
Finished pairwise comparison 0m:3s
Start GO update 28-Feb-22 06:30:26
Create database GO tables
Computing GOs for:
211 Sequence
11,957 Unique hits
Add GO/Interpro/Kegg/Pfam/EC to unique hits table
Transferring data from a table with 46,440 entries
10,332 GO
10,860 Interpro
4,575 KEGG
10,594 PFam
4,895 EC 0m:0s (86Mb)
Build Hit-GO table
Get Hits
10,332 hits to process 0m:0s (89Mb)
Hit to GO mapping
2,989 assigned GOs 0m:2s (93Mb)
Insert into Hit-GO table...
50,459 Hit-GO pairs 0m:8s (93Mb)
Find all inherited...
6,076 assigned and inherited GOs 0m:4s (114Mb)
Build Seq-GO table ...
Insert into Seq-GO table...
173 sequences have bestBits or bestAnno with GOs
35 sequences do not have bestBits or bestAnno with GOs
3 sequences have no GO 0m:11s (114Mb)
Update database with best Hit with GO per sequence...
208 update sequences with best Hit with GO 0m:0s (87Mb)
Build GO tables
Create graph_path from go_demo for GOs
6,076 processed 0m:3s (87Mb)
Create GO information table
6,076 added unique GOs 0m:3s (87Mb)
Add Slim Subset goslim_plant
97 Slims in goslim_plant
94 Added Slims 0m:0s (86Mb)
Finish GO update 0m:37s
End annotation for demoTra 1m:5s
-----------------------------------------------------