[2023-03-15 11:36:00,595] [INFO] DFAST_QC pipeline started. [2023-03-15 11:36:00,599] [INFO] DFAST_QC version: 0.5.7 [2023-03-15 11:36:00,599] [INFO] DQC Reference Directory: /var/lib/cwl/stg342c6c9c-eeef-478c-bc29-42d28cb11b26/dqc_reference [2023-03-15 11:36:02,576] [INFO] ===== Start taxonomy check using ANI ===== [2023-03-15 11:36:02,577] [INFO] Task started: Prodigal [2023-03-15 11:36:02,577] [INFO] Running command: cat /var/lib/cwl/stge5722d23-04ce-4e04-950f-31261d119253/OceanDNA-b38354.fa | prodigal -d OceanDNA-b38354/cds.fna -a OceanDNA-b38354/protein.faa -g 11 -q > /dev/null [2023-03-15 11:36:16,474] [INFO] Task succeeded: Prodigal [2023-03-15 11:36:16,474] [INFO] Task started: HMMsearch [2023-03-15 11:36:16,474] [INFO] Running command: hmmsearch --tblout OceanDNA-b38354/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg342c6c9c-eeef-478c-bc29-42d28cb11b26/dqc_reference/reference_markers.hmm OceanDNA-b38354/protein.faa > /dev/null [2023-03-15 11:36:16,691] [INFO] Task succeeded: HMMsearch [2023-03-15 11:36:16,691] [WARNING] Found 5/6 markers. [/var/lib/cwl/stge5722d23-04ce-4e04-950f-31261d119253/OceanDNA-b38354.fa] [2023-03-15 11:36:16,760] [INFO] Query marker FASTA was written to OceanDNA-b38354/markers.fasta [2023-03-15 11:36:16,761] [INFO] Task started: Blastn [2023-03-15 11:36:16,761] [INFO] Running command: blastn -query OceanDNA-b38354/markers.fasta -db /var/lib/cwl/stg342c6c9c-eeef-478c-bc29-42d28cb11b26/dqc_reference/reference_markers.fasta -out OceanDNA-b38354/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-03-15 11:36:17,505] [INFO] Task succeeded: Blastn [2023-03-15 11:36:17,515] [INFO] Selected 23 target genomes. [2023-03-15 11:36:17,516] [INFO] Target genome list was writen to OceanDNA-b38354/target_genomes.txt [2023-03-15 11:36:17,528] [INFO] Task started: fastANI [2023-03-15 11:36:17,528] [INFO] Running command: fastANI --query /var/lib/cwl/stge5722d23-04ce-4e04-950f-31261d119253/OceanDNA-b38354.fa --refList OceanDNA-b38354/target_genomes.txt --output OceanDNA-b38354/fastani_result.tsv --threads 1 [2023-03-15 11:36:33,654] [INFO] Task succeeded: fastANI [2023-03-15 11:36:33,655] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg342c6c9c-eeef-478c-bc29-42d28cb11b26/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2023-03-15 11:36:33,655] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg342c6c9c-eeef-478c-bc29-42d28cb11b26/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2023-03-15 11:36:33,668] [INFO] Found 23 fastANI hits (0 hits with ANI > threshold) [2023-03-15 11:36:33,668] [INFO] The taxonomy check result is classified as 'below_threshold'. [2023-03-15 11:36:33,668] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Salinicola salarius strain=DSM 18044 GCA_003206135.1 430457 430457 type True 94.7117 594 662 95 below_threshold Salinicola socius strain=DSM 19940 GCA_003206115.1 404433 404433 type True 85.5792 541 662 95 below_threshold Salinicola socius strain=SMB35 GCA_001937195.1 404433 404433 type True 85.5555 539 662 95 below_threshold Salinicola halophyticus strain=CR45 GCA_003206695.1 1808881 1808881 type True 84.7803 529 662 95 below_threshold Salinicola lusitanus strain=CR50 GCA_003206045.1 1949085 1949085 type True 84.3983 546 662 95 below_threshold Salinicola acroporae strain=LMG 28587 GCA_003206615.1 1541440 1541440 type True 84.118 536 662 95 below_threshold Salinicola halimionae strain=CPA60 GCA_003206065.1 1949081 1949081 type True 83.9868 520 662 95 below_threshold Salinicola peritrichatus strain=JCM 18795 GCA_003206715.1 1267424 1267424 type True 83.5624 513 662 95 below_threshold Salinicola endophyticus strain=CPA92 GCA_003206575.1 1949083 1949083 type True 81.6778 468 662 95 below_threshold Salinicola tamaricis strain=F01 GCA_003006155.1 1771309 1771309 type True 81.5938 459 662 95 below_threshold Halomonas shengliensis strain=CGMCC 1.6444 GCA_900104135.1 419597 419597 type True 78.8868 232 662 95 below_threshold Halomonas lactosivorans strain=KCTC 52281 GCA_003254665.1 2185141 2185141 type True 78.8444 270 662 95 below_threshold Halomonas salipaludis strain=WRN001 GCA_002286975.1 2032625 2032625 type True 78.7995 279 662 95 below_threshold Halomonas stenophila strain=CECT 7744 GCA_014192275.1 795312 795312 type True 78.7986 253 662 95 below_threshold Halomonas sulfidoxydans strain=MCCC 1A11059 GCA_017868775.1 2733484 2733484 type True 78.7552 266 662 95 below_threshold Halomonas zhangzhouensis strain=MCCC 1A11036 GCA_021404465.1 2733481 2733481 type True 78.4013 214 662 95 below_threshold Halomonas aerodenitrificans strain=MCCC 1A11058 GCA_021404405.1 2733483 2733483 type True 78.358 248 662 95 below_threshold Halomonas lysinitropha strain=3(2) GCA_902500215.1 2607506 2607506 type True 78.1959 205 662 95 below_threshold Halomonas tianxiuensis strain=BC-M4-5 GCA_009834345.1 2497861 2497861 type True 78.1869 237 662 95 below_threshold Halomonas icarae strain=D1-1 GCA_009901955.1 2691040 2691040 type True 78.1318 172 662 95 below_threshold Halomonas urumqiensis strain=BZ-SZ-XJ27 GCA_003028575.1 1684789 1684789 type True 78.0557 193 662 95 below_threshold Halomonas urumqiensis strain=BZ-SZ-XJ27 GCA_002879635.1 1684789 1684789 type True 78.0072 195 662 95 below_threshold Halomonas halodenitrificans strain=DSM 735 GCA_000620045.1 28252 28252 type True 77.8531 172 662 95 below_threshold -------------------------------------------------------------------------------- [2023-03-15 11:36:33,673] [INFO] DFAST Taxonomy check result was written to OceanDNA-b38354/tc_result.tsv [2023-03-15 11:36:33,680] [INFO] ===== Taxonomy check completed ===== [2023-03-15 11:36:33,680] [INFO] ===== Start completeness check using CheckM ===== [2023-03-15 11:36:33,680] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg342c6c9c-eeef-478c-bc29-42d28cb11b26/dqc_reference/checkm_data [2023-03-15 11:36:33,681] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2023-03-15 11:36:33,777] [INFO] Task started: CheckM [2023-03-15 11:36:33,777] [INFO] Running command: checkm taxonomy_wf --tab_table -f OceanDNA-b38354/cc_result.tsv -t 1 life "Prokaryote" OceanDNA-b38354/checkm_input OceanDNA-b38354/checkm_result [2023-03-15 11:37:12,102] [INFO] Task succeeded: CheckM [2023-03-15 11:37:12,103] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 84.81% Contamintation: 13.54% Strain heterogeneity: 100.00% -------------------------------------------------------------------------------- [2023-03-15 11:37:12,274] [INFO] ===== Completeness check finished ===== [2023-03-15 11:37:12,274] [INFO] ===== Start GTDB Search ===== [2023-03-15 11:37:12,274] [INFO] Query marker FASTA already exists. Will reuse it. (OceanDNA-b38354/markers.fasta) [2023-03-15 11:37:12,275] [INFO] Task started: Blastn [2023-03-15 11:37:12,275] [INFO] Running command: blastn -query OceanDNA-b38354/markers.fasta -db /var/lib/cwl/stg342c6c9c-eeef-478c-bc29-42d28cb11b26/dqc_reference/reference_markers_gtdb.fasta -out OceanDNA-b38354/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-03-15 11:37:13,472] [INFO] Task succeeded: Blastn [2023-03-15 11:37:13,482] [INFO] Selected 10 target genomes. [2023-03-15 11:37:13,482] [INFO] Target genome list was writen to OceanDNA-b38354/target_genomes_gtdb.txt [2023-03-15 11:37:13,493] [INFO] Task started: fastANI [2023-03-15 11:37:13,493] [INFO] Running command: fastANI --query /var/lib/cwl/stge5722d23-04ce-4e04-950f-31261d119253/OceanDNA-b38354.fa --refList OceanDNA-b38354/target_genomes_gtdb.txt --output OceanDNA-b38354/fastani_result_gtdb.tsv --threads 1 [2023-03-15 11:37:21,604] [INFO] Task succeeded: fastANI [2023-03-15 11:37:21,611] [INFO] Found 10 fastANI hits (1 hits with ANI > circumscription radius) [2023-03-15 11:37:21,611] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCF_002179555.1 s__Salinicola salarius_A 98.0655 632 662 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Halomonadaceae;g__Salinicola 95.0 N/A N/A N/A N/A 1 conclusive GCF_003206135.1 s__Salinicola salarius 94.7117 594 662 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Halomonadaceae;g__Salinicola 95.0 97.80 97.80 0.89 0.89 2 - GCF_001937195.1 s__Salinicola socius 85.5555 539 662 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Halomonadaceae;g__Salinicola 95.0 99.48 98.96 0.96 0.93 3 - GCA_002695065.1 s__Salinicola sp002695065 85.2622 519 662 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Halomonadaceae;g__Salinicola 95.0 97.83 97.83 0.85 0.85 2 - GCF_008298015.1 s__Salinicola sp008298015 84.9588 531 662 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Halomonadaceae;g__Salinicola 95.0 N/A N/A N/A N/A 1 - GCF_003206695.1 s__Salinicola halophyticus 84.7911 528 662 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Halomonadaceae;g__Salinicola 95.0 97.72 97.72 0.93 0.93 2 - GCF_003206045.1 s__Salinicola lusitanus 84.3851 546 662 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Halomonadaceae;g__Salinicola 95.0 96.87 96.87 0.93 0.93 2 - GCF_003206715.1 s__Salinicola peritrichatus 83.5757 512 662 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Halomonadaceae;g__Salinicola 95.0 N/A N/A N/A N/A 1 - GCF_014652715.1 s__Salinicola rhizosphaerae 80.8354 452 662 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Halomonadaceae;g__Salinicola 95.0 N/A N/A N/A N/A 1 - GCF_003206645.1 s__Salinicola aestuarinus 80.2205 383 662 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Halomonadaceae;g__Salinicola 95.0 N/A N/A N/A N/A 1 - -------------------------------------------------------------------------------- [2023-03-15 11:37:21,615] [INFO] GTDB search result was written to OceanDNA-b38354/result_gtdb.tsv [2023-03-15 11:37:21,622] [INFO] ===== GTDB Search completed ===== [2023-03-15 11:37:21,630] [INFO] DFAST_QC result json was written to OceanDNA-b38354/dqc_result.json [2023-03-15 11:37:21,630] [INFO] DFAST_QC completed! [2023-03-15 11:37:21,630] [INFO] Total running time: 0h1m21s