[2024-01-24 12:28:35,040] [INFO] DFAST_QC pipeline started. [2024-01-24 12:28:35,042] [INFO] DFAST_QC version: 0.5.7 [2024-01-24 12:28:35,042] [INFO] DQC Reference Directory: /var/lib/cwl/stg28f48ffc-d9b1-4934-9f75-bbbc9dd876e1/dqc_reference [2024-01-24 12:28:36,244] [INFO] ===== Start taxonomy check using ANI ===== [2024-01-24 12:28:36,244] [INFO] Task started: Prodigal [2024-01-24 12:28:36,244] [INFO] Running command: gunzip -c /var/lib/cwl/stg6f0366f2-5603-4279-bfd2-34f73616988a/GCF_020037475.1_ASM2003747v1_genomic.fna.gz | prodigal -d GCF_020037475.1_ASM2003747v1_genomic.fna/cds.fna -a GCF_020037475.1_ASM2003747v1_genomic.fna/protein.faa -g 11 -q > /dev/null [2024-01-24 12:28:43,451] [INFO] Task succeeded: Prodigal [2024-01-24 12:28:43,451] [INFO] Task started: HMMsearch [2024-01-24 12:28:43,451] [INFO] Running command: hmmsearch --tblout GCF_020037475.1_ASM2003747v1_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg28f48ffc-d9b1-4934-9f75-bbbc9dd876e1/dqc_reference/reference_markers.hmm GCF_020037475.1_ASM2003747v1_genomic.fna/protein.faa > /dev/null [2024-01-24 12:28:43,703] [INFO] Task succeeded: HMMsearch [2024-01-24 12:28:43,705] [WARNING] Found 5/6 markers. [/var/lib/cwl/stg6f0366f2-5603-4279-bfd2-34f73616988a/GCF_020037475.1_ASM2003747v1_genomic.fna.gz] [2024-01-24 12:28:43,747] [INFO] Query marker FASTA was written to GCF_020037475.1_ASM2003747v1_genomic.fna/markers.fasta [2024-01-24 12:28:43,748] [INFO] Task started: Blastn [2024-01-24 12:28:43,748] [INFO] Running command: blastn -query GCF_020037475.1_ASM2003747v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg28f48ffc-d9b1-4934-9f75-bbbc9dd876e1/dqc_reference/reference_markers.fasta -out GCF_020037475.1_ASM2003747v1_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-24 12:28:44,366] [INFO] Task succeeded: Blastn [2024-01-24 12:28:44,370] [INFO] Selected 20 target genomes. [2024-01-24 12:28:44,370] [INFO] Target genome list was writen to GCF_020037475.1_ASM2003747v1_genomic.fna/target_genomes.txt [2024-01-24 12:28:44,504] [INFO] Task started: fastANI [2024-01-24 12:28:44,505] [INFO] Running command: fastANI --query /var/lib/cwl/stg6f0366f2-5603-4279-bfd2-34f73616988a/GCF_020037475.1_ASM2003747v1_genomic.fna.gz --refList GCF_020037475.1_ASM2003747v1_genomic.fna/target_genomes.txt --output GCF_020037475.1_ASM2003747v1_genomic.fna/fastani_result.tsv --threads 1 [2024-01-24 12:28:58,662] [INFO] Task succeeded: fastANI [2024-01-24 12:28:58,663] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg28f48ffc-d9b1-4934-9f75-bbbc9dd876e1/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2024-01-24 12:28:58,663] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg28f48ffc-d9b1-4934-9f75-bbbc9dd876e1/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2024-01-24 12:28:58,677] [INFO] Found 14 fastANI hits (1 hits with ANI > threshold) [2024-01-24 12:28:58,677] [INFO] The taxonomy check result is classified as 'conclusive'. [2024-01-24 12:28:58,677] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Sutcliffiella deserti strain=DG-18 GCA_020037475.1 2875501 2875501 type True 100.0 1141 1141 95 conclusive Sutcliffiella halmapala strain=DSM 8723 GCA_002019665.1 79882 79882 type True 79.8195 420 1141 95 below_threshold Sutcliffiella cohnii strain=NBRC 15565 GCA_001591425.1 33932 33932 type True 77.7864 145 1141 95 below_threshold Sutcliffiella cohnii strain=DSM 6307 GCA_002250055.1 33932 33932 type True 77.6883 158 1141 95 below_threshold Bacillus alkalisoli strain=FJAT-45122 GCA_002797415.1 2011008 2011008 type True 77.6746 182 1141 95 below_threshold Bacillus paranthracis strain=Mn5 GCA_001883995.1 2026186 2026186 type True 77.3958 56 1141 95 below_threshold Bacillus dafuensis strain=FJAT-25496 GCA_007995155.1 1742359 1742359 type True 77.27 69 1141 95 below_threshold Bacillus proteolyticus strain=TD42 GCA_001884065.1 2026192 2026192 type True 77.1997 60 1141 95 below_threshold Priestia endophytica strain=DSM 13796 GCA_900115845.1 135735 135735 type True 77.0962 60 1141 95 below_threshold Bacillus mycoides strain=DSM 2048 GCA_022630575.1 1405 1405 type True 77.0627 70 1141 95 below_threshold Metabacillus iocasae strain=DSM 104297 GCA_016909075.1 2291674 2291674 type True 76.9483 78 1141 95 below_threshold Bacillus mycoides strain=DSM 2048 GCA_000003925.1 1405 1405 type True 76.8106 65 1141 95 below_threshold Metabacillus flavus strain=KIGAM252 GCA_018283675.1 2823519 2823519 type True 76.7723 51 1141 95 below_threshold Bacillus pakistanensis strain=DSM 24834 GCA_016908495.1 992288 992288 type True 76.6218 86 1141 95 below_threshold -------------------------------------------------------------------------------- [2024-01-24 12:28:58,679] [INFO] DFAST Taxonomy check result was written to GCF_020037475.1_ASM2003747v1_genomic.fna/tc_result.tsv [2024-01-24 12:28:58,679] [INFO] ===== Taxonomy check completed ===== [2024-01-24 12:28:58,680] [INFO] ===== Start completeness check using CheckM ===== [2024-01-24 12:28:58,680] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg28f48ffc-d9b1-4934-9f75-bbbc9dd876e1/dqc_reference/checkm_data [2024-01-24 12:28:58,681] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2024-01-24 12:28:58,718] [INFO] Task started: CheckM [2024-01-24 12:28:58,718] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCF_020037475.1_ASM2003747v1_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCF_020037475.1_ASM2003747v1_genomic.fna/checkm_input GCF_020037475.1_ASM2003747v1_genomic.fna/checkm_result [2024-01-24 12:29:25,891] [INFO] Task succeeded: CheckM [2024-01-24 12:29:25,892] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 87.50% Contamintation: 4.17% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2024-01-24 12:29:25,916] [INFO] ===== Completeness check finished ===== [2024-01-24 12:29:25,916] [INFO] ===== Start GTDB Search ===== [2024-01-24 12:29:25,917] [INFO] Query marker FASTA already exists. Will reuse it. (GCF_020037475.1_ASM2003747v1_genomic.fna/markers.fasta) [2024-01-24 12:29:25,917] [INFO] Task started: Blastn [2024-01-24 12:29:25,917] [INFO] Running command: blastn -query GCF_020037475.1_ASM2003747v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg28f48ffc-d9b1-4934-9f75-bbbc9dd876e1/dqc_reference/reference_markers_gtdb.fasta -out GCF_020037475.1_ASM2003747v1_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-24 12:29:26,807] [INFO] Task succeeded: Blastn [2024-01-24 12:29:26,812] [INFO] Selected 20 target genomes. [2024-01-24 12:29:26,812] [INFO] Target genome list was writen to GCF_020037475.1_ASM2003747v1_genomic.fna/target_genomes_gtdb.txt [2024-01-24 12:29:26,905] [INFO] Task started: fastANI [2024-01-24 12:29:26,906] [INFO] Running command: fastANI --query /var/lib/cwl/stg6f0366f2-5603-4279-bfd2-34f73616988a/GCF_020037475.1_ASM2003747v1_genomic.fna.gz --refList GCF_020037475.1_ASM2003747v1_genomic.fna/target_genomes_gtdb.txt --output GCF_020037475.1_ASM2003747v1_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2024-01-24 12:29:40,972] [INFO] Task succeeded: fastANI [2024-01-24 12:29:40,990] [INFO] Found 19 fastANI hits (0 hits with ANI > circumscription radius) [2024-01-24 12:29:40,990] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCF_002019665.1 s__Sutcliffiella_A halmapala 79.8429 418 1141 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae_I;g__Sutcliffiella_A 95.0 N/A N/A N/A N/A 1 - GCF_016908565.1 s__Sutcliffiella_A tianshenii 79.2729 374 1141 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae_I;g__Sutcliffiella_A 95.0 N/A N/A N/A N/A 1 - GCF_012524115.1 s__Sutcliffiella_A sp012524115 79.0359 294 1141 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae_I;g__Sutcliffiella_A 95.0 N/A N/A N/A N/A 1 - GCA_001648575.1 s__Sutcliffiella_A horikoshii 78.9704 275 1141 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae_I;g__Sutcliffiella_A 95.0 N/A N/A N/A N/A 1 - GCF_002157855.1 s__Sutcliffiella_A horikoshii_C 78.8893 302 1141 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae_I;g__Sutcliffiella_A 95.0 96.53 95.82 0.92 0.90 4 - GCA_001636495.1 s__Sutcliffiella_A horikoshii_A 78.865 316 1141 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae_I;g__Sutcliffiella_A 95.0 N/A N/A N/A N/A 1 - GCF_008180725.1 s__Sutcliffiella_A horikoshii_B 78.8304 305 1141 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae_I;g__Sutcliffiella_A 95.0 N/A N/A N/A N/A 1 - GCF_001293645.1 s__Sutcliffiella_A sp001293645 78.8035 313 1141 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae_I;g__Sutcliffiella_A 95.0 N/A N/A N/A N/A 1 - GCF_012911845.1 s__Sutcliffiella_A sp012911845 78.6949 291 1141 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae_I;g__Sutcliffiella_A 95.0 N/A N/A N/A N/A 1 - GCF_002335755.1 s__FJAT-45066 sp002335755 77.8116 176 1141 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae_I;g__FJAT-45066 95.0 99.38 99.38 0.94 0.94 2 - GCF_002250055.1 s__Sutcliffiella cohnii 77.7488 155 1141 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae_I;g__Sutcliffiella 95.0 100.00 100.00 0.97 0.97 2 - GCF_016909075.1 s__Priestia iocasae 76.9483 78 1141 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae_H;g__Priestia 95.0 N/A N/A N/A N/A 1 - GCF_002584985.1 s__Bacillus_A sp002584985 76.9152 63 1141 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae_G;g__Bacillus_A 95.0 99.52 99.41 0.96 0.94 8 - GCF_002975175.1 s__Bacillus_A sp002975175 76.8747 66 1141 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae_G;g__Bacillus_A 95.0 N/A N/A N/A N/A 1 - GCF_007673305.1 s__Bacillus_A mycoides_C 76.7905 71 1141 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae_G;g__Bacillus_A 95.0 95.83 95.78 0.82 0.81 3 - GCF_014874135.1 s__Neobacillus sp014874135 76.7154 57 1141 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales_B;f__DSM-18226;g__Neobacillus 95.0 N/A N/A N/A N/A 1 - GCF_016908495.1 s__Bacillus_BW pakistanensis 76.6218 86 1141 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales_B;f__Bacillaceae_B;g__Bacillus_BW 95.0 N/A N/A N/A N/A 1 - GCF_016820555.1 s__FJAT-46582 sp016820555 76.5291 65 1141 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales_B;f__Domibacillaceae;g__FJAT-46582 95.0 N/A N/A N/A N/A 1 - GCF_001884235.1 s__Bacillus_A paramycoides 76.515 66 1141 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae_G;g__Bacillus_A 95.0 97.72 97.18 0.88 0.85 10 - -------------------------------------------------------------------------------- [2024-01-24 12:29:40,992] [INFO] GTDB search result was written to GCF_020037475.1_ASM2003747v1_genomic.fna/result_gtdb.tsv [2024-01-24 12:29:40,992] [INFO] ===== GTDB Search completed ===== [2024-01-24 12:29:41,288] [INFO] DFAST_QC result json was written to GCF_020037475.1_ASM2003747v1_genomic.fna/dqc_result.json [2024-01-24 12:29:41,289] [INFO] DFAST_QC completed! [2024-01-24 12:29:41,289] [INFO] Total running time: 0h1m6s