[2024-01-24 13:08:58,475] [INFO] DFAST_QC pipeline started. [2024-01-24 13:08:58,476] [INFO] DFAST_QC version: 0.5.7 [2024-01-24 13:08:58,476] [INFO] DQC Reference Directory: /var/lib/cwl/stg882bd431-b2f7-4096-a683-5e4145e3189e/dqc_reference [2024-01-24 13:08:59,756] [INFO] ===== Start taxonomy check using ANI ===== [2024-01-24 13:08:59,757] [INFO] Task started: Prodigal [2024-01-24 13:08:59,758] [INFO] Running command: gunzip -c /var/lib/cwl/stg4cd74ae1-c975-413a-905a-d734fd4778ec/GCF_000577895.1_M2_40_genomic.fna.gz | prodigal -d GCF_000577895.1_M2_40_genomic.fna/cds.fna -a GCF_000577895.1_M2_40_genomic.fna/protein.faa -g 11 -q > /dev/null [2024-01-24 13:09:05,433] [INFO] Task succeeded: Prodigal [2024-01-24 13:09:05,433] [INFO] Task started: HMMsearch [2024-01-24 13:09:05,433] [INFO] Running command: hmmsearch --tblout GCF_000577895.1_M2_40_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg882bd431-b2f7-4096-a683-5e4145e3189e/dqc_reference/reference_markers.hmm GCF_000577895.1_M2_40_genomic.fna/protein.faa > /dev/null [2024-01-24 13:09:05,737] [INFO] Task succeeded: HMMsearch [2024-01-24 13:09:05,739] [INFO] Found 6/6 markers. [2024-01-24 13:09:05,770] [INFO] Query marker FASTA was written to GCF_000577895.1_M2_40_genomic.fna/markers.fasta [2024-01-24 13:09:05,771] [INFO] Task started: Blastn [2024-01-24 13:09:05,771] [INFO] Running command: blastn -query GCF_000577895.1_M2_40_genomic.fna/markers.fasta -db /var/lib/cwl/stg882bd431-b2f7-4096-a683-5e4145e3189e/dqc_reference/reference_markers.fasta -out GCF_000577895.1_M2_40_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-24 13:09:06,360] [INFO] Task succeeded: Blastn [2024-01-24 13:09:06,364] [INFO] Selected 29 target genomes. [2024-01-24 13:09:06,365] [INFO] Target genome list was writen to GCF_000577895.1_M2_40_genomic.fna/target_genomes.txt [2024-01-24 13:09:06,520] [INFO] Task started: fastANI [2024-01-24 13:09:06,521] [INFO] Running command: fastANI --query /var/lib/cwl/stg4cd74ae1-c975-413a-905a-d734fd4778ec/GCF_000577895.1_M2_40_genomic.fna.gz --refList GCF_000577895.1_M2_40_genomic.fna/target_genomes.txt --output GCF_000577895.1_M2_40_genomic.fna/fastani_result.tsv --threads 1 [2024-01-24 13:09:22,773] [INFO] Task succeeded: fastANI [2024-01-24 13:09:22,773] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg882bd431-b2f7-4096-a683-5e4145e3189e/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2024-01-24 13:09:22,774] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg882bd431-b2f7-4096-a683-5e4145e3189e/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2024-01-24 13:09:22,801] [INFO] Found 23 fastANI hits (1 hits with ANI > threshold) [2024-01-24 13:09:22,802] [INFO] The taxonomy check result is classified as 'conclusive'. [2024-01-24 13:09:22,802] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Clostridium bornimense strain=type strain: M2/40 GCA_000577895.1 1216932 1216932 type True 100.0 1202 1205 95 conclusive Clostridium paraputrificum strain=NCTC11833 GCA_900447045.1 29363 29363 type True 77.9023 169 1205 95 below_threshold Clostridium cellulovorans strain=743B GCA_000145275.1 1493 1493 type True 77.7301 172 1205 95 below_threshold Clostridium isatidis strain=DSM 15098 GCA_002285495.1 182773 182773 type True 77.6305 142 1205 95 below_threshold Clostridium lundense strain=DSM 17049 GCA_000619945.1 319475 319475 type True 77.5504 119 1205 95 below_threshold Sarcina ventriculi strain=NCTC12966 GCA_900456775.1 1267 1267 type True 77.5237 141 1205 95 below_threshold Clostridium perfringens strain=ATCC 13124 GCA_000013285.1 1502 1502 type True 77.3068 150 1205 95 below_threshold Clostridium fallax strain=NCTC8380 GCA_900461065.1 1533 1533 type True 77.2487 157 1205 95 below_threshold Clostridium tetanomorphum strain=DSM 4474 GCA_017873215.1 1553 1553 type True 76.988 154 1205 95 below_threshold Clostridium saudiense strain=JCC GCA_000577815.1 1414720 1414720 type True 76.7394 169 1205 95 below_threshold Clostridium thermobutyricum strain=DSM 4928 GCA_002050515.1 29372 29372 type True 76.7078 177 1205 95 below_threshold Clostridium niameyense strain=MT5 GCA_001243045.1 1622073 1622073 type True 76.5931 139 1205 95 below_threshold Clostridium moniliforme strain=DSM 3984 GCA_017873235.1 39489 39489 type True 76.3537 162 1205 95 below_threshold Clostridium mobile strain=MSJ-11 GCA_018918285.1 2841512 2841512 type True 76.2365 103 1205 95 below_threshold Clostridium simiarum strain=MSJ-4 GCA_018919175.1 2841506 2841506 type True 76.2222 111 1205 95 below_threshold Clostridium botulinum strain=ATCC 25763 GCA_011017965.1 1491 1491 type True 76.2152 147 1205 95 below_threshold Clostridium weizhouense strain=YB-6 GCA_019431045.1 2859781 2859781 type True 76.2108 190 1205 95 below_threshold Clostridium faecium strain=N37 GCA_014836835.1 2762223 2762223 type True 76.206 147 1205 95 below_threshold Clostridium senegalense strain=type strain: JC122 GCA_000285575.1 1465809 1465809 type True 76.1869 148 1205 95 below_threshold Clostridium acetireducens strain=DSM 10703 GCA_001758365.1 76489 76489 type True 76.1721 126 1205 95 below_threshold Clostridium botulinum strain=ATCC 25763 GCA_001276985.1 1491 1491 type True 76.158 151 1205 95 below_threshold Clostridium tepidiprofundi strain=DSM 19306 GCA_001594005.1 420412 420412 type True 75.8469 86 1205 95 below_threshold Clostridium prolinivorans strain=PYR-10 GCA_004011155.1 2769420 2769420 type True 75.7864 127 1205 95 below_threshold -------------------------------------------------------------------------------- [2024-01-24 13:09:22,804] [INFO] DFAST Taxonomy check result was written to GCF_000577895.1_M2_40_genomic.fna/tc_result.tsv [2024-01-24 13:09:22,804] [INFO] ===== Taxonomy check completed ===== [2024-01-24 13:09:22,804] [INFO] ===== Start completeness check using CheckM ===== [2024-01-24 13:09:22,805] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg882bd431-b2f7-4096-a683-5e4145e3189e/dqc_reference/checkm_data [2024-01-24 13:09:22,806] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2024-01-24 13:09:22,842] [INFO] Task started: CheckM [2024-01-24 13:09:22,843] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCF_000577895.1_M2_40_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCF_000577895.1_M2_40_genomic.fna/checkm_input GCF_000577895.1_M2_40_genomic.fna/checkm_result [2024-01-24 13:09:46,500] [INFO] Task succeeded: CheckM [2024-01-24 13:09:46,501] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 100.00% Contamintation: 0.00% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2024-01-24 13:09:46,521] [INFO] ===== Completeness check finished ===== [2024-01-24 13:09:46,521] [INFO] ===== Start GTDB Search ===== [2024-01-24 13:09:46,522] [INFO] Query marker FASTA already exists. Will reuse it. (GCF_000577895.1_M2_40_genomic.fna/markers.fasta) [2024-01-24 13:09:46,522] [INFO] Task started: Blastn [2024-01-24 13:09:46,522] [INFO] Running command: blastn -query GCF_000577895.1_M2_40_genomic.fna/markers.fasta -db /var/lib/cwl/stg882bd431-b2f7-4096-a683-5e4145e3189e/dqc_reference/reference_markers_gtdb.fasta -out GCF_000577895.1_M2_40_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-24 13:09:47,389] [INFO] Task succeeded: Blastn [2024-01-24 13:09:47,393] [INFO] Selected 23 target genomes. [2024-01-24 13:09:47,393] [INFO] Target genome list was writen to GCF_000577895.1_M2_40_genomic.fna/target_genomes_gtdb.txt [2024-01-24 13:09:47,542] [INFO] Task started: fastANI [2024-01-24 13:09:47,543] [INFO] Running command: fastANI --query /var/lib/cwl/stg4cd74ae1-c975-413a-905a-d734fd4778ec/GCF_000577895.1_M2_40_genomic.fna.gz --refList GCF_000577895.1_M2_40_genomic.fna/target_genomes_gtdb.txt --output GCF_000577895.1_M2_40_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2024-01-24 13:10:02,120] [INFO] Task succeeded: fastANI [2024-01-24 13:10:02,143] [INFO] Found 20 fastANI hits (1 hits with ANI > circumscription radius) [2024-01-24 13:10:02,143] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCF_000577895.1 s__Clostridium_AN bornimense 100.0 1202 1205 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium_AN 95.0 N/A N/A N/A N/A 1 conclusive GCF_018917145.1 s__Clostridium_AN bornimense_A 88.1235 886 1205 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium_AN 95.0 N/A N/A N/A N/A 1 - GCF_900447045.1 s__Clostridium paraputrificum 77.9132 169 1205 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium 95.0 97.58 95.42 0.89 0.83 29 - GCF_002285495.1 s__Clostridium isatidis 77.5271 141 1205 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium 95.0 99.19 99.19 0.81 0.81 2 - GCA_000753435.1 s__Clostridium_L amazonitimonense 77.4923 118 1205 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium_L 95.0 99.33 97.98 0.97 0.93 4 - GCF_900456775.1 s__Clostridium_P ventriculi 77.4334 139 1205 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium_P 95.0 98.93 98.34 0.95 0.94 8 - GCF_006742065.1 s__Clostridium butyricum 77.296 189 1205 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium 95.0 98.58 97.46 0.89 0.80 64 - GCF_002050515.1 s__Clostridium thermobutyricum 76.7108 179 1205 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium 95.0 97.10 97.10 0.91 0.89 3 - GCF_900129365.1 s__Clostridium_AH fallax 76.5401 150 1205 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium_AH 95.0 100.00 100.00 1.00 1.00 2 - GCF_000401215.1 s__Clostridium sartagoforme_A 76.3435 172 1205 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium 95.0 98.25 98.25 0.85 0.85 2 - GCA_017888565.1 s__Clostridium sp017888565 76.3078 165 1205 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium 95.0 N/A N/A N/A N/A 1 - GCF_018918285.1 s__MSJ-11 sp018918285 76.2365 103 1205 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__MSJ-11 95.0 N/A N/A N/A N/A 1 - GCF_000285575.1 s__Clostridium_J senegalense 76.1954 147 1205 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium_J 95.0 97.19 95.52 0.93 0.91 5 - GCF_001276985.1 s__Clostridium_F botulinum 76.1686 150 1205 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium_F 95.0 97.69 95.97 0.92 0.83 213 - GCF_001758365.1 s__Clostridium_C acetireducens 76.1599 127 1205 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium_C 95.0 N/A N/A N/A N/A 1 - GCF_000612845.1 s__Clostridium_J ihumii 76.0532 170 1205 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium_J 95.0 100.00 100.00 0.99 0.99 2 - GCF_001276215.1 s__Clostridium_F sp001276215 76.017 147 1205 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium_F 95.0 97.32 95.44 0.88 0.82 10 - GCA_002341865.1 s__Clostridium_J sp002341865 75.9007 124 1205 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium_J 95.0 N/A N/A N/A N/A 1 - GCF_900167505.1 s__F0540 sp900167505 75.8806 126 1205 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__F0540 95.0 98.87 98.87 0.90 0.90 2 - GCA_002455975.1 s__Clostridium_J sp002455975 75.8344 124 1205 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium_J 95.0 N/A N/A N/A N/A 1 - -------------------------------------------------------------------------------- [2024-01-24 13:10:02,146] [INFO] GTDB search result was written to GCF_000577895.1_M2_40_genomic.fna/result_gtdb.tsv [2024-01-24 13:10:02,147] [INFO] ===== GTDB Search completed ===== [2024-01-24 13:10:02,152] [INFO] DFAST_QC result json was written to GCF_000577895.1_M2_40_genomic.fna/dqc_result.json [2024-01-24 13:10:02,152] [INFO] DFAST_QC completed! [2024-01-24 13:10:02,152] [INFO] Total running time: 0h1m4s