[2023-06-28 12:02:01,495] [INFO] DFAST_QC pipeline started. [2023-06-28 12:02:01,498] [INFO] DFAST_QC version: 0.5.7 [2023-06-28 12:02:01,498] [INFO] DQC Reference Directory: /var/lib/cwl/stg2bbc067e-66f7-4950-9efe-c12c795e54ca/dqc_reference [2023-06-28 12:02:02,885] [INFO] ===== Start taxonomy check using ANI ===== [2023-06-28 12:02:02,886] [INFO] Task started: Prodigal [2023-06-28 12:02:02,886] [INFO] Running command: gunzip -c /var/lib/cwl/stga77c9b9f-5a68-43c7-a5bb-395a7b381144/GCA_005880435.1_ASM588043v1_genomic.fna.gz | prodigal -d GCA_005880435.1_ASM588043v1_genomic.fna/cds.fna -a GCA_005880435.1_ASM588043v1_genomic.fna/protein.faa -g 11 -q > /dev/null [2023-06-28 12:02:19,183] [INFO] Task succeeded: Prodigal [2023-06-28 12:02:19,183] [INFO] Task started: HMMsearch [2023-06-28 12:02:19,183] [INFO] Running command: hmmsearch --tblout GCA_005880435.1_ASM588043v1_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg2bbc067e-66f7-4950-9efe-c12c795e54ca/dqc_reference/reference_markers.hmm GCA_005880435.1_ASM588043v1_genomic.fna/protein.faa > /dev/null [2023-06-28 12:02:19,444] [INFO] Task succeeded: HMMsearch [2023-06-28 12:02:19,446] [INFO] Found 6/6 markers. [2023-06-28 12:02:19,481] [INFO] Query marker FASTA was written to GCA_005880435.1_ASM588043v1_genomic.fna/markers.fasta [2023-06-28 12:02:19,481] [INFO] Task started: Blastn [2023-06-28 12:02:19,482] [INFO] Running command: blastn -query GCA_005880435.1_ASM588043v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg2bbc067e-66f7-4950-9efe-c12c795e54ca/dqc_reference/reference_markers.fasta -out GCA_005880435.1_ASM588043v1_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-28 12:02:20,159] [INFO] Task succeeded: Blastn [2023-06-28 12:02:20,164] [INFO] Selected 20 target genomes. [2023-06-28 12:02:20,165] [INFO] Target genome list was writen to GCA_005880435.1_ASM588043v1_genomic.fna/target_genomes.txt [2023-06-28 12:02:20,173] [INFO] Task started: fastANI [2023-06-28 12:02:20,173] [INFO] Running command: fastANI --query /var/lib/cwl/stga77c9b9f-5a68-43c7-a5bb-395a7b381144/GCA_005880435.1_ASM588043v1_genomic.fna.gz --refList GCA_005880435.1_ASM588043v1_genomic.fna/target_genomes.txt --output GCA_005880435.1_ASM588043v1_genomic.fna/fastani_result.tsv --threads 1 [2023-06-28 12:02:31,941] [INFO] Task succeeded: fastANI [2023-06-28 12:02:31,942] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg2bbc067e-66f7-4950-9efe-c12c795e54ca/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2023-06-28 12:02:31,943] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg2bbc067e-66f7-4950-9efe-c12c795e54ca/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2023-06-28 12:02:31,953] [INFO] Found 7 fastANI hits (0 hits with ANI > threshold) [2023-06-28 12:02:31,953] [INFO] The taxonomy check result is classified as 'below_threshold'. [2023-06-28 12:02:31,953] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Symbiobacterium terraclitae strain=DSM 27138 GCA_017874315.1 557451 557451 type True 75.143 53 938 95 below_threshold Oharaeibacter diazotrophicus strain=SM30 GCA_011317485.1 1920512 1920512 type True 74.9387 77 938 95 below_threshold Pseudoxanthomonas broegbernensis strain=DSM 12573 GCA_014202435.1 83619 83619 type True 74.9182 54 938 95 below_threshold Methylobacterium nonmethylotrophicum strain=6HR-1 GCA_004745635.1 1141884 1141884 type True 74.8839 97 938 95 below_threshold Dokdonella koreensis strain=DS-123 GCA_001632775.1 323415 323415 type True 74.8683 62 938 95 below_threshold Oharaeibacter diazotrophicus strain=DSM 102969 GCA_004362745.1 1920512 1920512 type True 74.8498 96 938 95 below_threshold Chitinimonas koreensis strain=DSM 17726 GCA_000428465.1 356302 356302 type True 74.7444 93 938 95 below_threshold -------------------------------------------------------------------------------- [2023-06-28 12:02:31,956] [INFO] DFAST Taxonomy check result was written to GCA_005880435.1_ASM588043v1_genomic.fna/tc_result.tsv [2023-06-28 12:02:31,956] [INFO] ===== Taxonomy check completed ===== [2023-06-28 12:02:31,956] [INFO] ===== Start completeness check using CheckM ===== [2023-06-28 12:02:31,957] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg2bbc067e-66f7-4950-9efe-c12c795e54ca/dqc_reference/checkm_data [2023-06-28 12:02:31,958] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2023-06-28 12:02:31,997] [INFO] Task started: CheckM [2023-06-28 12:02:31,997] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCA_005880435.1_ASM588043v1_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCA_005880435.1_ASM588043v1_genomic.fna/checkm_input GCA_005880435.1_ASM588043v1_genomic.fna/checkm_result [2023-06-28 12:03:18,200] [INFO] Task succeeded: CheckM [2023-06-28 12:03:18,201] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 100.00% Contamintation: 0.00% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2023-06-28 12:03:18,226] [INFO] ===== Completeness check finished ===== [2023-06-28 12:03:18,226] [INFO] ===== Start GTDB Search ===== [2023-06-28 12:03:18,227] [INFO] Query marker FASTA already exists. Will reuse it. (GCA_005880435.1_ASM588043v1_genomic.fna/markers.fasta) [2023-06-28 12:03:18,228] [INFO] Task started: Blastn [2023-06-28 12:03:18,228] [INFO] Running command: blastn -query GCA_005880435.1_ASM588043v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg2bbc067e-66f7-4950-9efe-c12c795e54ca/dqc_reference/reference_markers_gtdb.fasta -out GCA_005880435.1_ASM588043v1_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-28 12:03:19,154] [INFO] Task succeeded: Blastn [2023-06-28 12:03:19,158] [INFO] Selected 20 target genomes. [2023-06-28 12:03:19,158] [INFO] Target genome list was writen to GCA_005880435.1_ASM588043v1_genomic.fna/target_genomes_gtdb.txt [2023-06-28 12:03:19,176] [INFO] Task started: fastANI [2023-06-28 12:03:19,177] [INFO] Running command: fastANI --query /var/lib/cwl/stga77c9b9f-5a68-43c7-a5bb-395a7b381144/GCA_005880435.1_ASM588043v1_genomic.fna.gz --refList GCA_005880435.1_ASM588043v1_genomic.fna/target_genomes_gtdb.txt --output GCA_005880435.1_ASM588043v1_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2023-06-28 12:03:30,824] [INFO] Task succeeded: fastANI [2023-06-28 12:03:30,837] [INFO] Found 15 fastANI hits (1 hits with ANI > circumscription radius) [2023-06-28 12:03:30,837] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCA_005880695.1 s__VBGR01 sp005880695 99.4545 826 938 d__Bacteria;p__Dormibacterota;c__Dormibacteria;o__Dormibacterales;f__Dormibacteraceae;g__VBGR01 95.0 99.40 99.35 0.94 0.93 3 conclusive GCA_005881155.1 s__CF-105 sp005881155 78.7052 355 938 d__Bacteria;p__Dormibacterota;c__Dormibacteria;o__Dormibacterales;f__Dormibacteraceae;g__CF-105 95.0 98.89 98.89 0.81 0.81 2 - GCA_002427995.1 s__UBA6019 sp002427995 77.7056 214 938 d__Bacteria;p__Dormibacterota;c__Dormibacteria;o__Dormibacterales;f__Dormibacteraceae;g__UBA6019 95.0 N/A N/A N/A N/A 1 - GCA_002404015.1 s__UBA4736 sp002404015 77.2707 223 938 d__Bacteria;p__Dormibacterota;c__Dormibacteria;o__Dormibacterales;f__Dormibacteraceae;g__UBA4736 95.0 N/A N/A N/A N/A 1 - GCA_005879985.1 s__CF-118 sp005879985 77.0801 161 938 d__Bacteria;p__Dormibacterota;c__Dormibacteria;o__Dormibacterales;f__Dormibacteraceae;g__CF-118 95.0 99.16 99.16 0.85 0.85 2 - GCA_005880795.1 s__40CM-4-65-16 sp005880795 77.062 175 938 d__Bacteria;p__Dormibacterota;c__Dormibacteria;o__Dormibacterales;f__Dormibacteraceae;g__40CM-4-65-16 95.0 99.34 99.06 0.87 0.86 4 - GCA_005880855.1 s__UBA10449 sp005880855 77.0558 151 938 d__Bacteria;p__Dormibacterota;c__Dormibacteria;o__Dormibacterales;f__Dormibacteraceae;g__UBA10449 95.0 99.07 99.07 0.87 0.87 2 - GCA_005879675.1 s__40CM-4-65-16 sp005879675 76.9993 150 938 d__Bacteria;p__Dormibacterota;c__Dormibacteria;o__Dormibacterales;f__Dormibacteraceae;g__40CM-4-65-16 95.0 N/A N/A N/A N/A 1 - GCA_004299235.1 s__Palsa-870 sp004299235 76.9828 183 938 d__Bacteria;p__Dormibacterota;c__Dormibacteria;o__Dormibacterales;f__Dormibacteraceae;g__Palsa-870 95.0 N/A N/A N/A N/A 1 - GCA_005880525.1 s__CF-118 sp005880525 76.803 134 938 d__Bacteria;p__Dormibacterota;c__Dormibacteria;o__Dormibacterales;f__Dormibacteraceae;g__CF-118 95.0 97.95 97.81 0.79 0.75 3 - GCA_005880175.1 s__CF-161 sp005880175 76.8028 248 938 d__Bacteria;p__Dormibacterota;c__Dormibacteria;o__Dormibacterales;f__Dormibacteraceae;g__CF-161 95.0 N/A N/A N/A N/A 1 - GCA_003169845.1 s__Palsa-870 sp003169845 76.7847 159 938 d__Bacteria;p__Dormibacterota;c__Dormibacteria;o__Dormibacterales;f__Dormibacteraceae;g__Palsa-870 95.0 99.95 99.95 0.97 0.97 2 - GCA_003446775.1 s__UBA10449 sp003446775 76.7419 152 938 d__Bacteria;p__Dormibacterota;c__Dormibacteria;o__Dormibacterales;f__Dormibacteraceae;g__UBA10449 95.0 99.60 99.60 0.89 0.89 2 - GCA_005880345.1 s__40CM-4-65-16 sp005880345 76.6421 110 938 d__Bacteria;p__Dormibacterota;c__Dormibacteria;o__Dormibacterales;f__Dormibacteraceae;g__40CM-4-65-16 95.0 N/A N/A N/A N/A 1 - GCA_004366205.1 s__SIRW01 sp004366205 75.162 67 938 d__Bacteria;p__Actinobacteriota;c__Acidimicrobiia;o__Acidimicrobiales;f__SIRW01;g__SIRW01 95.0 N/A N/A N/A N/A 1 - -------------------------------------------------------------------------------- [2023-06-28 12:03:30,839] [INFO] GTDB search result was written to GCA_005880435.1_ASM588043v1_genomic.fna/result_gtdb.tsv [2023-06-28 12:03:30,840] [INFO] ===== GTDB Search completed ===== [2023-06-28 12:03:30,843] [INFO] DFAST_QC result json was written to GCA_005880435.1_ASM588043v1_genomic.fna/dqc_result.json [2023-06-28 12:03:30,844] [INFO] DFAST_QC completed! [2023-06-28 12:03:30,844] [INFO] Total running time: 0h1m29s