[2023-03-16 10:00:16,240] [INFO] DFAST_QC pipeline started. [2023-03-16 10:00:16,240] [INFO] DFAST_QC version: 0.5.7 [2023-03-16 10:00:16,240] [INFO] DQC Reference Directory: /var/lib/cwl/stgda40fb41-339e-4648-a2c8-cf059f3df029/dqc_reference [2023-03-16 10:00:17,331] [INFO] ===== Start taxonomy check using ANI ===== [2023-03-16 10:00:17,332] [INFO] Task started: Prodigal [2023-03-16 10:00:17,332] [INFO] Running command: cat /var/lib/cwl/stg02dd5823-603e-4f05-9d82-a23cc7da5f58/OceanDNA-b30025.fa | prodigal -d OceanDNA-b30025/cds.fna -a OceanDNA-b30025/protein.faa -g 11 -q > /dev/null [2023-03-16 10:00:40,895] [INFO] Task succeeded: Prodigal [2023-03-16 10:00:40,895] [INFO] Task started: HMMsearch [2023-03-16 10:00:40,895] [INFO] Running command: hmmsearch --tblout OceanDNA-b30025/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stgda40fb41-339e-4648-a2c8-cf059f3df029/dqc_reference/reference_markers.hmm OceanDNA-b30025/protein.faa > /dev/null [2023-03-16 10:00:41,092] [INFO] Task succeeded: HMMsearch [2023-03-16 10:00:41,093] [WARNING] Found 5/6 markers. [/var/lib/cwl/stg02dd5823-603e-4f05-9d82-a23cc7da5f58/OceanDNA-b30025.fa] [2023-03-16 10:00:41,119] [INFO] Query marker FASTA was written to OceanDNA-b30025/markers.fasta [2023-03-16 10:00:41,120] [INFO] Task started: Blastn [2023-03-16 10:00:41,120] [INFO] Running command: blastn -query OceanDNA-b30025/markers.fasta -db /var/lib/cwl/stgda40fb41-339e-4648-a2c8-cf059f3df029/dqc_reference/reference_markers.fasta -out OceanDNA-b30025/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-03-16 10:00:41,691] [INFO] Task succeeded: Blastn [2023-03-16 10:00:41,693] [INFO] Selected 25 target genomes. [2023-03-16 10:00:41,693] [INFO] Target genome list was writen to OceanDNA-b30025/target_genomes.txt [2023-03-16 10:00:41,701] [INFO] Task started: fastANI [2023-03-16 10:00:41,701] [INFO] Running command: fastANI --query /var/lib/cwl/stg02dd5823-603e-4f05-9d82-a23cc7da5f58/OceanDNA-b30025.fa --refList OceanDNA-b30025/target_genomes.txt --output OceanDNA-b30025/fastani_result.tsv --threads 1 [2023-03-16 10:00:58,129] [INFO] Task succeeded: fastANI [2023-03-16 10:00:58,130] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stgda40fb41-339e-4648-a2c8-cf059f3df029/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2023-03-16 10:00:58,130] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stgda40fb41-339e-4648-a2c8-cf059f3df029/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2023-03-16 10:00:58,143] [INFO] Found 25 fastANI hits (1 hits with ANI > threshold) [2023-03-16 10:00:58,144] [INFO] The taxonomy check result is classified as 'conclusive'. [2023-03-16 10:00:58,144] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Shimia thalassica strain=CECT 7735 GCA_001458215.1 1715693 1715693 type True 97.6841 1111 1150 95 conclusive Shimia sediminis strain=ZQ172 GCA_003990645.1 2497945 2497945 type True 78.1922 360 1150 95 below_threshold Shimia gijangensis strain=DSM 100564 GCA_900142085.1 1470563 1470563 type True 77.8636 319 1150 95 below_threshold Shimia haliotis strain=DSM 28453 GCA_900114415.1 1280847 1280847 type True 77.6404 275 1150 95 below_threshold Shimia abyssi strain=DSM 100673 GCA_003014475.1 1662395 1662395 type True 77.4568 230 1150 95 below_threshold Shimia isoporae strain=DSM 26433 GCA_004346865.1 647720 647720 type True 77.2753 223 1150 95 below_threshold Shimia marina strain=CECT 7688 GCA_001458175.1 321267 321267 type True 77.1964 227 1150 95 below_threshold Shimia marina strain=DSM 26895 GCA_900112745.1 321267 321267 type True 77.1371 229 1150 95 below_threshold Tritonibacter litoralis strain=SM1979 GCA_009496005.1 2662264 2662264 type True 76.9839 138 1150 95 below_threshold Celeribacter litoreus strain=ASW11-22 GCA_020165855.1 2876714 2876714 type True 76.9177 90 1150 95 below_threshold Phaeobacter italicus strain=CECT 7645 GCA_001258055.1 481446 481446 type True 76.8974 185 1150 95 below_threshold Phaeobacter italicus strain=DSM 26436 GCA_900113345.1 481446 481446 type True 76.8885 186 1150 95 below_threshold Pseudophaeobacter flagellatus strain=MA21411-1 GCA_021228235.1 2899119 2899119 type True 76.7877 180 1150 95 below_threshold Ruegeria atlantica strain=CECT 4292 GCA_001458195.1 81569 81569 suspected-type True 76.7574 173 1150 95 below_threshold Ruegeria meonggei strain=CECT 8411 GCA_900172215.1 1446476 1446476 type True 76.7372 171 1150 95 below_threshold Phaeobacter inhibens strain=DSM 16374 GCA_000473105.1 221822 221822 type True 76.729 158 1150 95 below_threshold Phaeobacter gallaeciensis strain=DSM 26640 GCA_000819625.1 60890 60890 type True 76.7226 165 1150 95 below_threshold Phaeobacter piscinae strain=P14 GCA_002407245.1 1580596 1580596 type True 76.668 182 1150 95 below_threshold Phaeobacter gallaeciensis strain=DSM 26640 GCA_000511385.1 60890 60890 type True 76.6611 168 1150 95 below_threshold Ruegeria denitrificans strain=CECT 5091 GCA_001458295.1 1715692 1715692 type True 76.6549 186 1150 95 below_threshold Primorskyibacter sedentarius strain=DSM 104836 GCA_004342065.1 745311 745311 type True 76.4356 153 1150 95 below_threshold Marivivens niveibacter strain=MCCC 1A06712 GCA_002150005.2 1930667 1930667 type True 76.3791 120 1150 95 below_threshold Sulfitobacter indolifex strain=DSM 14862 GCA_022788655.1 225422 225422 type True 76.2368 129 1150 95 below_threshold Roseovarius confluentis strain=SAG6 GCA_002917925.1 1852027 1852027 type True 76.1082 117 1150 95 below_threshold Gemmobacter fulva strain=con5 GCA_018798885.1 2840474 2840474 type True 75.9879 91 1150 95 below_threshold -------------------------------------------------------------------------------- [2023-03-16 10:00:58,145] [INFO] DFAST Taxonomy check result was written to OceanDNA-b30025/tc_result.tsv [2023-03-16 10:00:58,145] [INFO] ===== Taxonomy check completed ===== [2023-03-16 10:00:58,145] [INFO] ===== Start completeness check using CheckM ===== [2023-03-16 10:00:58,145] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stgda40fb41-339e-4648-a2c8-cf059f3df029/dqc_reference/checkm_data [2023-03-16 10:00:58,146] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2023-03-16 10:00:58,375] [INFO] Task started: CheckM [2023-03-16 10:00:58,375] [INFO] Running command: checkm taxonomy_wf --tab_table -f OceanDNA-b30025/cc_result.tsv -t 1 life "Prokaryote" OceanDNA-b30025/checkm_input OceanDNA-b30025/checkm_result [2023-03-16 10:01:55,465] [INFO] Task succeeded: CheckM [2023-03-16 10:01:55,465] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 70.83% Contamintation: 4.17% Strain heterogeneity: 100.00% -------------------------------------------------------------------------------- [2023-03-16 10:01:55,481] [INFO] ===== Completeness check finished ===== [2023-03-16 10:01:55,481] [INFO] ===== Start GTDB Search ===== [2023-03-16 10:01:55,481] [INFO] Query marker FASTA already exists. Will reuse it. (OceanDNA-b30025/markers.fasta) [2023-03-16 10:01:55,482] [INFO] Task started: Blastn [2023-03-16 10:01:55,482] [INFO] Running command: blastn -query OceanDNA-b30025/markers.fasta -db /var/lib/cwl/stgda40fb41-339e-4648-a2c8-cf059f3df029/dqc_reference/reference_markers_gtdb.fasta -out OceanDNA-b30025/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-03-16 10:01:56,484] [INFO] Task succeeded: Blastn [2023-03-16 10:01:56,486] [INFO] Selected 20 target genomes. [2023-03-16 10:01:56,486] [INFO] Target genome list was writen to OceanDNA-b30025/target_genomes_gtdb.txt [2023-03-16 10:01:56,503] [INFO] Task started: fastANI [2023-03-16 10:01:56,503] [INFO] Running command: fastANI --query /var/lib/cwl/stg02dd5823-603e-4f05-9d82-a23cc7da5f58/OceanDNA-b30025.fa --refList OceanDNA-b30025/target_genomes_gtdb.txt --output OceanDNA-b30025/fastani_result_gtdb.tsv --threads 1 [2023-03-16 10:02:11,012] [INFO] Task succeeded: fastANI [2023-03-16 10:02:11,023] [INFO] Found 20 fastANI hits (1 hits with ANI > circumscription radius) [2023-03-16 10:02:11,023] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCF_001458215.1 s__Shimia thalassica 97.6841 1111 1150 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Shimia 95.0 96.93 96.93 0.94 0.94 2 conclusive GCF_003990645.1 s__Shimia sediminis 78.1799 361 1150 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Shimia 95.0 N/A N/A N/A N/A 1 - GCF_900114635.1 s__Shimia aestuarii 78.0336 300 1150 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Shimia 95.0 N/A N/A N/A N/A 1 - GCF_900142085.1 s__Shimia gijangensis 77.8636 319 1150 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Shimia 95.0 N/A N/A N/A N/A 1 - GCA_018223085.1 s__Shimia sp018223085 77.7357 262 1150 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Shimia 95.0 N/A N/A N/A N/A 1 - GCF_001292625.1 s__Shimia sp001292625 77.7238 261 1150 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Shimia 95.0 N/A N/A N/A N/A 1 - GCF_900114415.1 s__Shimia haliotis 77.6424 274 1150 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Shimia 95.0 N/A N/A N/A N/A 1 - GCF_003014475.1 s__Shimia abyssi 77.438 230 1150 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Shimia 95.0 N/A N/A N/A N/A 1 - GCF_017744095.1 s__Shimia sp017744095 77.4082 235 1150 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Shimia 95.0 N/A N/A N/A N/A 1 - GCF_017743735.1 s__Shimia sp017743735 77.3239 270 1150 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Shimia 95.0 95.87 95.63 0.93 0.93 3 - GCF_004799325.1 s__Thalassobius vesicularis 77.1788 184 1150 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Thalassobius 95.0 N/A N/A N/A N/A 1 - GCA_002708925.1 s__Thalassobius sp002708925 77.0158 161 1150 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Thalassobius 95.0 99.91 99.91 0.84 0.84 2 - GCF_013031405.1 s__Ruegeria sp013031405 76.9702 196 1150 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Ruegeria 95.0 N/A N/A N/A N/A 1 - GCF_009496005.1 s__Epibacterium litoralis 76.9685 139 1150 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Epibacterium 95.0 N/A N/A N/A N/A 1 - GCF_900129345.1 s__Ruegeria intermedia 76.9229 184 1150 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Ruegeria 95.0 N/A N/A N/A N/A 1 - GCA_017792425.1 s__Thalassobius sp017792425 76.8308 162 1150 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Thalassobius 95.0 N/A N/A N/A N/A 1 - GCF_001518015.1 s__Epibacterium horizontis 76.7791 170 1150 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Epibacterium 95.0 N/A N/A N/A N/A 1 - GCF_900172215.1 s__Ruegeria meonggei 76.7062 171 1150 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Ruegeria 95.0 N/A N/A N/A N/A 1 - GCF_001458295.1 s__Ruegeria denitrificans 76.6559 186 1150 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Ruegeria 95.0 N/A N/A N/A N/A 1 - GCF_017592335.1 s__Sulfitobacter_F sp017592335 76.1553 91 1150 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Sulfitobacter_F 95.0 N/A N/A N/A N/A 1 - -------------------------------------------------------------------------------- [2023-03-16 10:02:11,024] [INFO] GTDB search result was written to OceanDNA-b30025/result_gtdb.tsv [2023-03-16 10:02:11,024] [INFO] ===== GTDB Search completed ===== [2023-03-16 10:02:11,027] [INFO] DFAST_QC result json was written to OceanDNA-b30025/dqc_result.json [2023-03-16 10:02:11,027] [INFO] DFAST_QC completed! [2023-03-16 10:02:11,027] [INFO] Total running time: 0h1m55s