[2024-01-24 11:44:23,568] [INFO] DFAST_QC pipeline started. [2024-01-24 11:44:23,570] [INFO] DFAST_QC version: 0.5.7 [2024-01-24 11:44:23,570] [INFO] DQC Reference Directory: /var/lib/cwl/stg6f804158-76e1-4a94-8d92-6f78ee8cc750/dqc_reference [2024-01-24 11:44:26,339] [INFO] ===== Start taxonomy check using ANI ===== [2024-01-24 11:44:26,340] [INFO] Task started: Prodigal [2024-01-24 11:44:26,341] [INFO] Running command: gunzip -c /var/lib/cwl/stge92f532e-91fc-411f-bc00-c9d49fdb7d48/GCF_025660375.1_ASM2566037v1_genomic.fna.gz | prodigal -d GCF_025660375.1_ASM2566037v1_genomic.fna/cds.fna -a GCF_025660375.1_ASM2566037v1_genomic.fna/protein.faa -g 11 -q > /dev/null [2024-01-24 11:44:32,601] [INFO] Task succeeded: Prodigal [2024-01-24 11:44:32,602] [INFO] Task started: HMMsearch [2024-01-24 11:44:32,602] [INFO] Running command: hmmsearch --tblout GCF_025660375.1_ASM2566037v1_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg6f804158-76e1-4a94-8d92-6f78ee8cc750/dqc_reference/reference_markers.hmm GCF_025660375.1_ASM2566037v1_genomic.fna/protein.faa > /dev/null [2024-01-24 11:44:32,895] [INFO] Task succeeded: HMMsearch [2024-01-24 11:44:32,896] [INFO] Found 6/6 markers. [2024-01-24 11:44:32,923] [INFO] Query marker FASTA was written to GCF_025660375.1_ASM2566037v1_genomic.fna/markers.fasta [2024-01-24 11:44:32,924] [INFO] Task started: Blastn [2024-01-24 11:44:32,924] [INFO] Running command: blastn -query GCF_025660375.1_ASM2566037v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg6f804158-76e1-4a94-8d92-6f78ee8cc750/dqc_reference/reference_markers.fasta -out GCF_025660375.1_ASM2566037v1_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-24 11:44:33,531] [INFO] Task succeeded: Blastn [2024-01-24 11:44:33,535] [INFO] Selected 30 target genomes. [2024-01-24 11:44:33,535] [INFO] Target genome list was writen to GCF_025660375.1_ASM2566037v1_genomic.fna/target_genomes.txt [2024-01-24 11:44:33,578] [INFO] Task started: fastANI [2024-01-24 11:44:33,579] [INFO] Running command: fastANI --query /var/lib/cwl/stge92f532e-91fc-411f-bc00-c9d49fdb7d48/GCF_025660375.1_ASM2566037v1_genomic.fna.gz --refList GCF_025660375.1_ASM2566037v1_genomic.fna/target_genomes.txt --output GCF_025660375.1_ASM2566037v1_genomic.fna/fastani_result.tsv --threads 1 [2024-01-24 11:44:51,962] [INFO] Task succeeded: fastANI [2024-01-24 11:44:51,962] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg6f804158-76e1-4a94-8d92-6f78ee8cc750/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2024-01-24 11:44:51,963] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg6f804158-76e1-4a94-8d92-6f78ee8cc750/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2024-01-24 11:44:51,983] [INFO] Found 17 fastANI hits (0 hits with ANI > threshold) [2024-01-24 11:44:51,983] [INFO] The taxonomy check result is classified as 'below_threshold'. [2024-01-24 11:44:51,983] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Caldibacillus pasinlerensis strain=P1 GCA_009996845.1 2703818 2703818 type True 82.2608 593 1078 95 below_threshold Bacillus andreraoultii strain=SIT1 GCA_001244735.1 1499685 1499685 type True 78.2745 203 1078 95 below_threshold Bacillus kwashiorkori strain=SIT6 GCA_001375515.1 1522318 1522318 type True 77.7934 103 1078 95 below_threshold Ornithinibacillus halotolerans strain=CGMCC 1.12408 GCA_014637405.1 1274357 1274357 type True 77.4523 60 1078 95 below_threshold Heyndrickxia vini strain=JCM 19841 GCA_016772275.1 1476025 1476025 type True 76.8988 81 1078 95 below_threshold Cytobacillus solani strain=FJAT-18043 GCA_001420595.1 1637975 1637975 type True 76.7701 66 1078 95 below_threshold Sutcliffiella cohnii strain=DSM 6307 GCA_002250055.1 33932 33932 type True 76.6153 67 1078 95 below_threshold Anoxybacillus vitaminiphilus strain=CGMCC 1.8979 GCA_003259935.1 581036 581036 type True 76.456 68 1078 95 below_threshold Peribacillus alkalitolerans strain=KCTC 33631 GCA_010882125.1 1550385 1550385 type True 76.4298 50 1078 95 below_threshold Peribacillus butanolivorans strain=DSM 18926 GCA_001273755.1 421767 421767 type True 76.3783 60 1078 95 below_threshold Ureibacillus massiliensis strain=4400831 GCA_002200855.1 292806 292806 type True 76.3473 62 1078 95 below_threshold Ureibacillus massiliensis strain=CCUG 49529 GCA_000772965.1 292806 292806 type True 76.3473 62 1078 95 below_threshold Metabacillus sediminilitoris strain=DSL-17 GCA_009720625.1 2567941 2567941 type True 76.2722 73 1078 95 below_threshold Metabacillus sediminilitoris strain=DSL-17 GCA_004801455.1 2567941 2567941 type True 76.0407 70 1078 95 below_threshold Priestia megaterium strain=ATCC 14581 GCA_900113355.1 1404 1404 suspected-type True 75.9757 71 1078 95 below_threshold Priestia megaterium strain=NBRC 15308 GCA_001591525.1 1404 1404 type True 75.9212 69 1078 95 below_threshold Schinkia azotoformans strain=LMG 9581 GCA_000307855.1 1454 1454 type True 75.6242 59 1078 95 below_threshold -------------------------------------------------------------------------------- [2024-01-24 11:44:51,985] [INFO] DFAST Taxonomy check result was written to GCF_025660375.1_ASM2566037v1_genomic.fna/tc_result.tsv [2024-01-24 11:44:51,985] [INFO] ===== Taxonomy check completed ===== [2024-01-24 11:44:51,986] [INFO] ===== Start completeness check using CheckM ===== [2024-01-24 11:44:51,986] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg6f804158-76e1-4a94-8d92-6f78ee8cc750/dqc_reference/checkm_data [2024-01-24 11:44:51,987] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2024-01-24 11:44:52,021] [INFO] Task started: CheckM [2024-01-24 11:44:52,021] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCF_025660375.1_ASM2566037v1_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCF_025660375.1_ASM2566037v1_genomic.fna/checkm_input GCF_025660375.1_ASM2566037v1_genomic.fna/checkm_result [2024-01-24 11:45:17,076] [INFO] Task succeeded: CheckM [2024-01-24 11:45:17,077] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 100.00% Contamintation: 0.00% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2024-01-24 11:45:17,092] [INFO] ===== Completeness check finished ===== [2024-01-24 11:45:17,092] [INFO] ===== Start GTDB Search ===== [2024-01-24 11:45:17,093] [INFO] Query marker FASTA already exists. Will reuse it. (GCF_025660375.1_ASM2566037v1_genomic.fna/markers.fasta) [2024-01-24 11:45:17,093] [INFO] Task started: Blastn [2024-01-24 11:45:17,094] [INFO] Running command: blastn -query GCF_025660375.1_ASM2566037v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg6f804158-76e1-4a94-8d92-6f78ee8cc750/dqc_reference/reference_markers_gtdb.fasta -out GCF_025660375.1_ASM2566037v1_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-24 11:45:17,919] [INFO] Task succeeded: Blastn [2024-01-24 11:45:17,923] [INFO] Selected 23 target genomes. [2024-01-24 11:45:17,924] [INFO] Target genome list was writen to GCF_025660375.1_ASM2566037v1_genomic.fna/target_genomes_gtdb.txt [2024-01-24 11:45:17,941] [INFO] Task started: fastANI [2024-01-24 11:45:17,942] [INFO] Running command: fastANI --query /var/lib/cwl/stge92f532e-91fc-411f-bc00-c9d49fdb7d48/GCF_025660375.1_ASM2566037v1_genomic.fna.gz --refList GCF_025660375.1_ASM2566037v1_genomic.fna/target_genomes_gtdb.txt --output GCF_025660375.1_ASM2566037v1_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2024-01-24 11:45:31,086] [INFO] Task succeeded: fastANI [2024-01-24 11:45:31,099] [INFO] Found 16 fastANI hits (1 hits with ANI > circumscription radius) [2024-01-24 11:45:31,099] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCA_902796085.1 s__RUG14133 sp902796085 99.0973 728 1078 d__Bacteria;p__Firmicutes;c__Bacilli;o__DSM-16016;f__Caldibacillaceae;g__RUG14133 95.0 N/A N/A N/A N/A 1 conclusive GCF_009996845.1 s__RUG14133 sp009996845 82.243 593 1078 d__Bacteria;p__Firmicutes;c__Bacilli;o__DSM-16016;f__Caldibacillaceae;g__RUG14133 95.0 N/A N/A N/A N/A 1 - GCF_001244735.1 s__Bacillus_J andreraoultii 78.3053 202 1078 d__Bacteria;p__Firmicutes;c__Bacilli;o__DSM-16016;f__Caldibacillaceae;g__Bacillus_J 95.0 100.00 100.00 0.99 0.99 4 - GCF_001375515.1 s__Bacillus_BK kwashiorkori 77.7934 103 1078 d__Bacteria;p__Firmicutes;c__Bacilli;o__DSM-16016;f__Caldibacillaceae;g__Bacillus_BK 95.0 N/A N/A N/A N/A 1 - GCF_000751775.1 s__Bacillus_J thermoamylovorans 76.9589 155 1078 d__Bacteria;p__Firmicutes;c__Bacilli;o__DSM-16016;f__Caldibacillaceae;g__Bacillus_J 95.0 98.31 97.86 0.86 0.85 9 - GCF_016772275.1 s__Heyndrickxia vini 76.8697 82 1078 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales_B;f__Bacillaceae_C;g__Heyndrickxia 95.0 N/A N/A N/A N/A 1 - GCF_001375675.1 s__Massilibacterium senegalense 76.7255 56 1078 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales_E;f__Massilibacteriaceae;g__Massilibacterium 95.0 100.00 100.00 1.00 1.00 2 - GCF_900217795.1 s__Ureibacillus xyleni 76.6728 55 1078 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales_A;f__Planococcaceae;g__Ureibacillus 95.0 N/A N/A N/A N/A 1 - GCF_003259935.1 s__Anoxybacillus_B vitaminiphilus 76.4842 67 1078 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Anoxybacillaceae;g__Anoxybacillus_B 95.0 N/A N/A N/A N/A 1 - GCF_000242895.2 s__Bacillus_BU sp000242895 76.4611 61 1078 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales_B;f__DSM-18226;g__Bacillus_BU 95.0 N/A N/A N/A N/A 1 - GCF_001273755.1 s__Peribacillus butanolivorans 76.3442 59 1078 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales_B;f__DSM-1321;g__Peribacillus 95.0 97.70 97.35 0.87 0.86 10 - GCF_003628435.1 s__Ureibacillus endophyticus 76.327 59 1078 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales_A;f__Planococcaceae;g__Ureibacillus 95.0 N/A N/A N/A N/A 1 - GCF_003711845.1 s__Ureibacillus halotolerans 76.0111 57 1078 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales_A;f__Planococcaceae;g__Ureibacillus 95.0 N/A N/A N/A N/A 1 - GCF_001884235.1 s__Bacillus_A paramycoides 75.8881 63 1078 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae_G;g__Bacillus_A 95.0 97.72 97.18 0.88 0.85 10 - GCA_012840465.1 s__Calidifontibacillus sp012840465 75.799 54 1078 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales_C;f__Bacillaceae_J;g__Calidifontibacillus 95.0 N/A N/A N/A N/A 1 - GCF_000307855.1 s__Calidifontibacillus azotoformans 75.6242 59 1078 d__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales_C;f__Bacillaceae_J;g__Calidifontibacillus 95.0 97.05 97.05 0.89 0.89 2 - -------------------------------------------------------------------------------- [2024-01-24 11:45:31,101] [INFO] GTDB search result was written to GCF_025660375.1_ASM2566037v1_genomic.fna/result_gtdb.tsv [2024-01-24 11:45:31,101] [INFO] ===== GTDB Search completed ===== [2024-01-24 11:45:31,105] [INFO] DFAST_QC result json was written to GCF_025660375.1_ASM2566037v1_genomic.fna/dqc_result.json [2024-01-24 11:45:31,106] [INFO] DFAST_QC completed! [2024-01-24 11:45:31,106] [INFO] Total running time: 0h1m8s