[2024-01-25 19:45:20,938] [INFO] DFAST_QC pipeline started. [2024-01-25 19:45:20,940] [INFO] DFAST_QC version: 0.5.7 [2024-01-25 19:45:20,940] [INFO] DQC Reference Directory: /var/lib/cwl/stg374de499-3ca0-4642-82f4-15d13c1338ea/dqc_reference [2024-01-25 19:45:22,125] [INFO] ===== Start taxonomy check using ANI ===== [2024-01-25 19:45:22,126] [INFO] Task started: Prodigal [2024-01-25 19:45:22,126] [INFO] Running command: gunzip -c /var/lib/cwl/stgb572705a-a1b8-4568-9d85-49a6d917c3c8/GCF_030160375.1_ASM3016037v1_genomic.fna.gz | prodigal -d GCF_030160375.1_ASM3016037v1_genomic.fna/cds.fna -a GCF_030160375.1_ASM3016037v1_genomic.fna/protein.faa -g 11 -q > /dev/null [2024-01-25 19:45:31,314] [INFO] Task succeeded: Prodigal [2024-01-25 19:45:31,315] [INFO] Task started: HMMsearch [2024-01-25 19:45:31,315] [INFO] Running command: hmmsearch --tblout GCF_030160375.1_ASM3016037v1_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg374de499-3ca0-4642-82f4-15d13c1338ea/dqc_reference/reference_markers.hmm GCF_030160375.1_ASM3016037v1_genomic.fna/protein.faa > /dev/null [2024-01-25 19:45:31,563] [INFO] Task succeeded: HMMsearch [2024-01-25 19:45:31,564] [INFO] Found 6/6 markers. [2024-01-25 19:45:31,599] [INFO] Query marker FASTA was written to GCF_030160375.1_ASM3016037v1_genomic.fna/markers.fasta [2024-01-25 19:45:31,599] [INFO] Task started: Blastn [2024-01-25 19:45:31,599] [INFO] Running command: blastn -query GCF_030160375.1_ASM3016037v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg374de499-3ca0-4642-82f4-15d13c1338ea/dqc_reference/reference_markers.fasta -out GCF_030160375.1_ASM3016037v1_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-25 19:45:32,399] [INFO] Task succeeded: Blastn [2024-01-25 19:45:32,402] [INFO] Selected 15 target genomes. [2024-01-25 19:45:32,402] [INFO] Target genome list was writen to GCF_030160375.1_ASM3016037v1_genomic.fna/target_genomes.txt [2024-01-25 19:45:32,409] [INFO] Task started: fastANI [2024-01-25 19:45:32,410] [INFO] Running command: fastANI --query /var/lib/cwl/stgb572705a-a1b8-4568-9d85-49a6d917c3c8/GCF_030160375.1_ASM3016037v1_genomic.fna.gz --refList GCF_030160375.1_ASM3016037v1_genomic.fna/target_genomes.txt --output GCF_030160375.1_ASM3016037v1_genomic.fna/fastani_result.tsv --threads 1 [2024-01-25 19:45:44,993] [INFO] Task succeeded: fastANI [2024-01-25 19:45:44,994] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg374de499-3ca0-4642-82f4-15d13c1338ea/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2024-01-25 19:45:44,994] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg374de499-3ca0-4642-82f4-15d13c1338ea/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2024-01-25 19:45:45,003] [INFO] Found 14 fastANI hits (0 hits with ANI > threshold) [2024-01-25 19:45:45,003] [INFO] The taxonomy check result is classified as 'below_threshold'. [2024-01-25 19:45:45,003] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Mixta calida strain=DSM 22759 GCA_002953215.1 665913 665913 type True 84.9419 1063 1413 95 below_threshold Mixta calida strain=LMG 25383 GCA_002095355.1 665913 665913 type True 84.9308 979 1413 95 below_threshold Mixta gaviniae strain=DSM 22758 GCA_002953195.1 665914 665914 type True 84.8553 1058 1413 95 below_threshold Mixta tenebrionis strain=BIT-26 GCA_006517625.1 2562439 2562439 type True 84.1906 1009 1413 95 below_threshold Pantoea septica strain=LMG 5345 GCA_002095575.1 472695 472695 type True 80.1993 740 1413 95 below_threshold Erwinia iniecta strain=B120 GCA_001267535.1 1560201 1560201 type True 80.1448 698 1413 95 below_threshold Erwinia phyllosphaerae strain=CMYE1 GCA_019132875.1 2853256 2853256 type True 79.7406 679 1413 95 below_threshold Erwinia aphidicola strain=X001 GCA_024169515.1 68334 68334 type True 79.7029 682 1413 95 below_threshold Erwinia aphidicola strain=JCM 21238 GCA_014773485.1 68334 68334 type True 79.6994 673 1413 95 below_threshold Siccibacter turicensis strain=LMG 23730 GCA_000463155.2 357233 357233 type True 78.9158 502 1413 95 below_threshold Cronobacter sakazakii strain=ATCC 29544 GCA_001971035.1 28141 28141 type True 78.858 521 1413 95 below_threshold Enterobacter wuhouensis strain=WCHEW120002 GCA_004331265.1 2529381 2529381 type True 78.73 474 1413 95 below_threshold Enterobacter roggenkampii strain=DSM 16690 GCA_024390995.1 1812935 1812935 type True 78.6561 473 1413 95 below_threshold Providencia thailandensis strain=KCTC 23281 GCA_014652175.1 990144 990144 type True 77.8782 79 1413 95 below_threshold -------------------------------------------------------------------------------- [2024-01-25 19:45:45,004] [INFO] DFAST Taxonomy check result was written to GCF_030160375.1_ASM3016037v1_genomic.fna/tc_result.tsv [2024-01-25 19:45:45,005] [INFO] ===== Taxonomy check completed ===== [2024-01-25 19:45:45,005] [INFO] ===== Start completeness check using CheckM ===== [2024-01-25 19:45:45,005] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg374de499-3ca0-4642-82f4-15d13c1338ea/dqc_reference/checkm_data [2024-01-25 19:45:45,006] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2024-01-25 19:45:45,053] [INFO] Task started: CheckM [2024-01-25 19:45:45,053] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCF_030160375.1_ASM3016037v1_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCF_030160375.1_ASM3016037v1_genomic.fna/checkm_input GCF_030160375.1_ASM3016037v1_genomic.fna/checkm_result [2024-01-25 19:46:14,792] [INFO] Task succeeded: CheckM [2024-01-25 19:46:14,793] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 100.00% Contamintation: 0.00% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2024-01-25 19:46:14,810] [INFO] ===== Completeness check finished ===== [2024-01-25 19:46:14,810] [INFO] ===== Start GTDB Search ===== [2024-01-25 19:46:14,811] [INFO] Query marker FASTA already exists. Will reuse it. (GCF_030160375.1_ASM3016037v1_genomic.fna/markers.fasta) [2024-01-25 19:46:14,811] [INFO] Task started: Blastn [2024-01-25 19:46:14,811] [INFO] Running command: blastn -query GCF_030160375.1_ASM3016037v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg374de499-3ca0-4642-82f4-15d13c1338ea/dqc_reference/reference_markers_gtdb.fasta -out GCF_030160375.1_ASM3016037v1_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-25 19:46:15,998] [INFO] Task succeeded: Blastn [2024-01-25 19:46:16,005] [INFO] Selected 10 target genomes. [2024-01-25 19:46:16,005] [INFO] Target genome list was writen to GCF_030160375.1_ASM3016037v1_genomic.fna/target_genomes_gtdb.txt [2024-01-25 19:46:16,016] [INFO] Task started: fastANI [2024-01-25 19:46:16,017] [INFO] Running command: fastANI --query /var/lib/cwl/stgb572705a-a1b8-4568-9d85-49a6d917c3c8/GCF_030160375.1_ASM3016037v1_genomic.fna.gz --refList GCF_030160375.1_ASM3016037v1_genomic.fna/target_genomes_gtdb.txt --output GCF_030160375.1_ASM3016037v1_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2024-01-25 19:46:27,190] [INFO] Task succeeded: fastANI [2024-01-25 19:46:27,198] [INFO] Found 10 fastANI hits (1 hits with ANI > circumscription radius) [2024-01-25 19:46:27,198] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCF_002895925.1 s__Mixta theicola 99.9843 1395 1413 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Mixta 95.0 N/A N/A N/A N/A 1 conclusive GCF_002101395.1 s__Mixta alhagi 89.5204 1114 1413 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Mixta 95.0 N/A N/A N/A N/A 1 - GCF_002953215.1 s__Mixta calida 84.9267 1065 1413 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Mixta 95.0 99.65 99.42 0.96 0.92 22 - GCF_002953195.1 s__Mixta gaviniae 84.8261 1061 1413 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Mixta 95.0 99.91 99.91 0.99 0.99 2 - GCF_006517625.1 s__Mixta tenebrionis 84.1892 1009 1413 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Mixta 95.0 99.45 99.45 0.92 0.92 2 - GCF_009914055.1 s__Mixta intestinalis 83.6296 990 1413 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Mixta 95.0 N/A N/A N/A N/A 1 - GCF_002920175.1 s__Pantoea sp002920175 80.7763 767 1413 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Pantoea 95.0 99.09 99.09 0.94 0.94 2 - GCF_002095465.1 s__Pantoea rodasii 79.5932 604 1413 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Pantoea 95.0 99.97 99.97 1.00 1.00 2 - GCF_018842675.1 s__Pantoea sp018842675 79.4484 584 1413 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Pantoea 95.0 N/A N/A N/A N/A 1 - GCF_002858715.1 s__Chimaeribacter coloradensis 79.002 525 1413 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Chimaeribacter 95.0 98.66 98.66 0.97 0.97 2 - -------------------------------------------------------------------------------- [2024-01-25 19:46:27,199] [INFO] GTDB search result was written to GCF_030160375.1_ASM3016037v1_genomic.fna/result_gtdb.tsv [2024-01-25 19:46:27,200] [INFO] ===== GTDB Search completed ===== [2024-01-25 19:46:27,206] [INFO] DFAST_QC result json was written to GCF_030160375.1_ASM3016037v1_genomic.fna/dqc_result.json [2024-01-25 19:46:27,206] [INFO] DFAST_QC completed! [2024-01-25 19:46:27,206] [INFO] Total running time: 0h1m6s