[2024-01-24 13:19:28,021] [INFO] DFAST_QC pipeline started. [2024-01-24 13:19:28,028] [INFO] DFAST_QC version: 0.5.7 [2024-01-24 13:19:28,028] [INFO] DQC Reference Directory: /var/lib/cwl/stg874b7492-2c3a-47e7-88bf-e16816c274fa/dqc_reference [2024-01-24 13:19:29,395] [INFO] ===== Start taxonomy check using ANI ===== [2024-01-24 13:19:29,396] [INFO] Task started: Prodigal [2024-01-24 13:19:29,397] [INFO] Running command: gunzip -c /var/lib/cwl/stg0503dd45-4dcb-4480-bdd2-a171da95eeea/GCF_009664085.1_ASM966408v1_genomic.fna.gz | prodigal -d GCF_009664085.1_ASM966408v1_genomic.fna/cds.fna -a GCF_009664085.1_ASM966408v1_genomic.fna/protein.faa -g 11 -q > /dev/null [2024-01-24 13:19:38,713] [INFO] Task succeeded: Prodigal [2024-01-24 13:19:38,713] [INFO] Task started: HMMsearch [2024-01-24 13:19:38,713] [INFO] Running command: hmmsearch --tblout GCF_009664085.1_ASM966408v1_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg874b7492-2c3a-47e7-88bf-e16816c274fa/dqc_reference/reference_markers.hmm GCF_009664085.1_ASM966408v1_genomic.fna/protein.faa > /dev/null [2024-01-24 13:19:38,971] [INFO] Task succeeded: HMMsearch [2024-01-24 13:19:38,972] [INFO] Found 6/6 markers. [2024-01-24 13:19:39,003] [INFO] Query marker FASTA was written to GCF_009664085.1_ASM966408v1_genomic.fna/markers.fasta [2024-01-24 13:19:39,004] [INFO] Task started: Blastn [2024-01-24 13:19:39,004] [INFO] Running command: blastn -query GCF_009664085.1_ASM966408v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg874b7492-2c3a-47e7-88bf-e16816c274fa/dqc_reference/reference_markers.fasta -out GCF_009664085.1_ASM966408v1_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-24 13:19:39,766] [INFO] Task succeeded: Blastn [2024-01-24 13:19:39,770] [INFO] Selected 18 target genomes. [2024-01-24 13:19:39,771] [INFO] Target genome list was writen to GCF_009664085.1_ASM966408v1_genomic.fna/target_genomes.txt [2024-01-24 13:19:39,788] [INFO] Task started: fastANI [2024-01-24 13:19:39,788] [INFO] Running command: fastANI --query /var/lib/cwl/stg0503dd45-4dcb-4480-bdd2-a171da95eeea/GCF_009664085.1_ASM966408v1_genomic.fna.gz --refList GCF_009664085.1_ASM966408v1_genomic.fna/target_genomes.txt --output GCF_009664085.1_ASM966408v1_genomic.fna/fastani_result.tsv --threads 1 [2024-01-24 13:19:54,066] [INFO] Task succeeded: fastANI [2024-01-24 13:19:54,067] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg874b7492-2c3a-47e7-88bf-e16816c274fa/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2024-01-24 13:19:54,067] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg874b7492-2c3a-47e7-88bf-e16816c274fa/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2024-01-24 13:19:54,085] [INFO] Found 17 fastANI hits (0 hits with ANI > threshold) [2024-01-24 13:19:54,085] [INFO] The taxonomy check result is classified as 'below_threshold'. [2024-01-24 13:19:54,085] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Allochromatium tepidum strain=NZ GCA_018409545.1 553982 553982 type True 86.3334 803 986 95 below_threshold Allochromatium humboldtianum strain=DSM 21881 GCA_013385175.1 504901 504901 type True 86.0066 798 986 95 below_threshold Allochromatium vinosum strain=DSM 180 GCA_000025485.1 1049 1049 type True 85.4891 748 986 95 below_threshold Thiocystis violacea strain=DSM 207 GCA_016583575.1 13725 13725 type True 79.1817 518 986 95 below_threshold Marichromatium gracile strain=DSM 203 GCA_004343155.1 1048 1048 type True 78.8757 454 986 95 below_threshold Marichromatium purpuratum strain=984 GCA_000224005.3 37487 37487 type True 78.8702 433 986 95 below_threshold Thiorhodococcus minor strain=DSM 11518 GCA_010820565.1 57489 57489 type True 78.442 410 986 95 below_threshold Allochromatium warmingii strain=DSM 173 GCA_900107145.1 61595 61595 type True 78.3222 229 986 95 below_threshold Caldichromatium japonicum strain=No.7 GCA_011290485.1 2699430 2699430 type True 78.2208 411 986 95 below_threshold Thiocapsa rosea strain=DSM 235 GCA_003634315.1 69360 69360 type True 78.1044 323 986 95 below_threshold Thiocapsa roseopersicina strain=DSM 217 GCA_900106925.1 1058 1058 type True 78.0057 316 986 95 below_threshold Thiocapsa bogorovii strain=BBS GCA_021228795.1 521689 521689 type True 77.9263 291 986 95 below_threshold Thioflavicoccus mobilis strain=8321 GCA_000327045.1 80679 80679 type True 77.8135 236 986 95 below_threshold Luteimonas granuli strain=Gr-4 GCA_007795095.1 1176533 1176533 type True 76.6519 74 986 95 below_threshold Pseudomonas cavernae strain=K2W31S-8 GCA_003595175.1 2320867 2320867 type True 76.6469 85 986 95 below_threshold Pseudomonas hydrolytica strain=DSWY01 GCA_021495345.2 2493633 2493633 type True 76.5092 87 986 95 below_threshold Pseudomonas insulae strain=UL073 GCA_016901015.1 2809017 2809017 type True 75.6769 79 986 95 below_threshold -------------------------------------------------------------------------------- [2024-01-24 13:19:54,087] [INFO] DFAST Taxonomy check result was written to GCF_009664085.1_ASM966408v1_genomic.fna/tc_result.tsv [2024-01-24 13:19:54,088] [INFO] ===== Taxonomy check completed ===== [2024-01-24 13:19:54,088] [INFO] ===== Start completeness check using CheckM ===== [2024-01-24 13:19:54,088] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg874b7492-2c3a-47e7-88bf-e16816c274fa/dqc_reference/checkm_data [2024-01-24 13:19:54,089] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2024-01-24 13:19:54,122] [INFO] Task started: CheckM [2024-01-24 13:19:54,122] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCF_009664085.1_ASM966408v1_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCF_009664085.1_ASM966408v1_genomic.fna/checkm_input GCF_009664085.1_ASM966408v1_genomic.fna/checkm_result [2024-01-24 13:20:24,633] [INFO] Task succeeded: CheckM [2024-01-24 13:20:24,635] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 100.00% Contamintation: 0.00% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2024-01-24 13:20:24,654] [INFO] ===== Completeness check finished ===== [2024-01-24 13:20:24,654] [INFO] ===== Start GTDB Search ===== [2024-01-24 13:20:24,654] [INFO] Query marker FASTA already exists. Will reuse it. (GCF_009664085.1_ASM966408v1_genomic.fna/markers.fasta) [2024-01-24 13:20:24,655] [INFO] Task started: Blastn [2024-01-24 13:20:24,655] [INFO] Running command: blastn -query GCF_009664085.1_ASM966408v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg874b7492-2c3a-47e7-88bf-e16816c274fa/dqc_reference/reference_markers_gtdb.fasta -out GCF_009664085.1_ASM966408v1_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-24 13:20:25,743] [INFO] Task succeeded: Blastn [2024-01-24 13:20:25,748] [INFO] Selected 8 target genomes. [2024-01-24 13:20:25,748] [INFO] Target genome list was writen to GCF_009664085.1_ASM966408v1_genomic.fna/target_genomes_gtdb.txt [2024-01-24 13:20:25,771] [INFO] Task started: fastANI [2024-01-24 13:20:25,771] [INFO] Running command: fastANI --query /var/lib/cwl/stg0503dd45-4dcb-4480-bdd2-a171da95eeea/GCF_009664085.1_ASM966408v1_genomic.fna.gz --refList GCF_009664085.1_ASM966408v1_genomic.fna/target_genomes_gtdb.txt --output GCF_009664085.1_ASM966408v1_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2024-01-24 13:20:33,104] [INFO] Task succeeded: fastANI [2024-01-24 13:20:33,119] [INFO] Found 8 fastANI hits (1 hits with ANI > circumscription radius) [2024-01-24 13:20:33,119] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCF_009664085.1 s__Thermochromatium tepidum 100.0 985 986 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Chromatiales;f__Chromatiaceae;g__Thermochromatium 95.0 N/A N/A N/A N/A 1 conclusive GCF_018409545.1 s__Allochromatium tepidum 86.3298 805 986 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Chromatiales;f__Chromatiaceae;g__Allochromatium 95.0 N/A N/A N/A N/A 1 - GCF_013385175.1 s__Allochromatium humboldtianum 86.0212 797 986 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Chromatiales;f__Chromatiaceae;g__Allochromatium 95.0 N/A N/A N/A N/A 1 - GCF_000025485.1 s__Allochromatium vinosum 85.5071 743 986 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Chromatiales;f__Chromatiaceae;g__Allochromatium 95.0 97.59 97.59 0.87 0.87 2 - GCF_009720725.1 s__Allochromatium palmeri 83.3632 735 986 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Chromatiales;f__Chromatiaceae;g__Allochromatium 95.0 N/A N/A N/A N/A 1 - GCF_016583575.1 s__Thiocystis violacea 79.186 517 986 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Chromatiales;f__Chromatiaceae;g__Thiocystis 95.0 N/A N/A N/A N/A 1 - GCF_900107145.1 s__Allochromatium warmingii 78.3222 229 986 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Chromatiales;f__Chromatiaceae;g__Allochromatium 95.0 N/A N/A N/A N/A 1 - GCF_900106925.1 s__Thiocapsa roseopersicina 78.0047 316 986 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Chromatiales;f__Chromatiaceae;g__Thiocapsa 95.0 N/A N/A N/A N/A 1 - -------------------------------------------------------------------------------- [2024-01-24 13:20:33,120] [INFO] GTDB search result was written to GCF_009664085.1_ASM966408v1_genomic.fna/result_gtdb.tsv [2024-01-24 13:20:33,121] [INFO] ===== GTDB Search completed ===== [2024-01-24 13:20:33,124] [INFO] DFAST_QC result json was written to GCF_009664085.1_ASM966408v1_genomic.fna/dqc_result.json [2024-01-24 13:20:33,124] [INFO] DFAST_QC completed! [2024-01-24 13:20:33,124] [INFO] Total running time: 0h1m5s