[2023-06-28 15:07:13,215] [INFO] DFAST_QC pipeline started. [2023-06-28 15:07:13,217] [INFO] DFAST_QC version: 0.5.7 [2023-06-28 15:07:13,217] [INFO] DQC Reference Directory: /var/lib/cwl/stgc44abe72-3660-4a2b-82f7-e3c8d5a9d96f/dqc_reference [2023-06-28 15:07:14,367] [INFO] ===== Start taxonomy check using ANI ===== [2023-06-28 15:07:14,368] [INFO] Task started: Prodigal [2023-06-28 15:07:14,368] [INFO] Running command: gunzip -c /var/lib/cwl/stg7ab668e1-a5ba-4ca0-88d9-253647486d63/GCA_020851285.1_ASM2085128v1_genomic.fna.gz | prodigal -d GCA_020851285.1_ASM2085128v1_genomic.fna/cds.fna -a GCA_020851285.1_ASM2085128v1_genomic.fna/protein.faa -g 11 -q > /dev/null [2023-06-28 15:07:27,958] [INFO] Task succeeded: Prodigal [2023-06-28 15:07:27,958] [INFO] Task started: HMMsearch [2023-06-28 15:07:27,958] [INFO] Running command: hmmsearch --tblout GCA_020851285.1_ASM2085128v1_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stgc44abe72-3660-4a2b-82f7-e3c8d5a9d96f/dqc_reference/reference_markers.hmm GCA_020851285.1_ASM2085128v1_genomic.fna/protein.faa > /dev/null [2023-06-28 15:07:28,197] [INFO] Task succeeded: HMMsearch [2023-06-28 15:07:28,198] [INFO] Found 6/6 markers. [2023-06-28 15:07:28,238] [INFO] Query marker FASTA was written to GCA_020851285.1_ASM2085128v1_genomic.fna/markers.fasta [2023-06-28 15:07:28,239] [INFO] Task started: Blastn [2023-06-28 15:07:28,239] [INFO] Running command: blastn -query GCA_020851285.1_ASM2085128v1_genomic.fna/markers.fasta -db /var/lib/cwl/stgc44abe72-3660-4a2b-82f7-e3c8d5a9d96f/dqc_reference/reference_markers.fasta -out GCA_020851285.1_ASM2085128v1_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-28 15:07:28,854] [INFO] Task succeeded: Blastn [2023-06-28 15:07:28,857] [INFO] Selected 29 target genomes. [2023-06-28 15:07:28,858] [INFO] Target genome list was writen to GCA_020851285.1_ASM2085128v1_genomic.fna/target_genomes.txt [2023-06-28 15:07:28,860] [INFO] Task started: fastANI [2023-06-28 15:07:28,860] [INFO] Running command: fastANI --query /var/lib/cwl/stg7ab668e1-a5ba-4ca0-88d9-253647486d63/GCA_020851285.1_ASM2085128v1_genomic.fna.gz --refList GCA_020851285.1_ASM2085128v1_genomic.fna/target_genomes.txt --output GCA_020851285.1_ASM2085128v1_genomic.fna/fastani_result.tsv --threads 1 [2023-06-28 15:07:48,684] [INFO] Task succeeded: fastANI [2023-06-28 15:07:48,685] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stgc44abe72-3660-4a2b-82f7-e3c8d5a9d96f/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2023-06-28 15:07:48,685] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stgc44abe72-3660-4a2b-82f7-e3c8d5a9d96f/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2023-06-28 15:07:48,699] [INFO] Found 18 fastANI hits (0 hits with ANI > threshold) [2023-06-28 15:07:48,699] [INFO] The taxonomy check result is classified as 'below_threshold'. [2023-06-28 15:07:48,699] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Corallococcus exercitus strain=AB043A GCA_003611585.1 2316736 2316736 type True 74.9603 77 1683 95 below_threshold Schlegelella thermodepolymerans strain=DSM 15344 GCA_002933415.1 215580 215580 type True 74.9249 57 1683 95 below_threshold Schlegelella thermodepolymerans strain=DSM 15344 GCA_015476235.1 215580 215580 type True 74.9215 58 1683 95 below_threshold Thalassobaculum fulvum strain=KCTC 42651 GCA_014652915.1 1633335 1633335 type True 74.9016 97 1683 95 below_threshold Pyxidicoccus fallax strain=DSM 14698 GCA_012933655.1 394095 394095 type True 74.899 103 1683 95 below_threshold Zavarzinia compransoris strain=DSM 1231 GCA_003173055.1 1264899 1264899 type True 74.8372 52 1683 95 below_threshold Saccharomonospora saliphila strain=YIM 90502 GCA_000383795.1 369829 369829 type True 74.829 51 1683 95 below_threshold Roseospira goensis strain=JA135 GCA_014197795.1 391922 391922 type True 74.8289 65 1683 95 below_threshold Azospirillum thiophilum strain=DSM 21654 GCA_000960825.1 528244 528244 type True 74.7691 73 1683 95 below_threshold Pseudomonas oryzae strain=KCTC 32247 GCA_900104805.1 1392877 1392877 type True 74.7638 72 1683 95 below_threshold Hypericibacter terrae strain=R5913 GCA_008728855.1 2602015 2602015 type True 74.7477 52 1683 95 below_threshold Hypericibacter adhaerens strain=R5959 GCA_008728835.1 2602016 2602016 type True 74.7439 59 1683 95 below_threshold Streptomyces parmotrematis strain=Ptm05 GCA_019890615.1 2873249 2873249 type True 74.7279 84 1683 95 below_threshold Longimicrobium terrae strain=CB-286315 GCA_013000925.1 1639882 1639882 type True 74.7082 62 1683 95 below_threshold Longimicrobium terrae strain=DSM 29007 GCA_014202995.1 1639882 1639882 type True 74.7082 62 1683 95 below_threshold Longimicrobium terrae strain=CECT 8660 GCA_014198875.1 1639882 1639882 type True 74.7082 62 1683 95 below_threshold Kineococcus xinjiangensis strain=DSM 22857 GCA_002934625.1 512762 512762 type True 74.6909 66 1683 95 below_threshold Streptomyces palmae strain=JCM 31289 GCA_004684805.1 1701085 1701085 type True 74.6672 85 1683 95 below_threshold -------------------------------------------------------------------------------- [2023-06-28 15:07:48,701] [INFO] DFAST Taxonomy check result was written to GCA_020851285.1_ASM2085128v1_genomic.fna/tc_result.tsv [2023-06-28 15:07:48,701] [INFO] ===== Taxonomy check completed ===== [2023-06-28 15:07:48,702] [INFO] ===== Start completeness check using CheckM ===== [2023-06-28 15:07:48,702] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stgc44abe72-3660-4a2b-82f7-e3c8d5a9d96f/dqc_reference/checkm_data [2023-06-28 15:07:48,703] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2023-06-28 15:07:48,751] [INFO] Task started: CheckM [2023-06-28 15:07:48,751] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCA_020851285.1_ASM2085128v1_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCA_020851285.1_ASM2085128v1_genomic.fna/checkm_input GCA_020851285.1_ASM2085128v1_genomic.fna/checkm_result [2023-06-28 15:08:31,283] [INFO] Task succeeded: CheckM [2023-06-28 15:08:31,284] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 100.00% Contamintation: 0.00% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2023-06-28 15:08:31,302] [INFO] ===== Completeness check finished ===== [2023-06-28 15:08:31,303] [INFO] ===== Start GTDB Search ===== [2023-06-28 15:08:31,303] [INFO] Query marker FASTA already exists. Will reuse it. (GCA_020851285.1_ASM2085128v1_genomic.fna/markers.fasta) [2023-06-28 15:08:31,303] [INFO] Task started: Blastn [2023-06-28 15:08:31,303] [INFO] Running command: blastn -query GCA_020851285.1_ASM2085128v1_genomic.fna/markers.fasta -db /var/lib/cwl/stgc44abe72-3660-4a2b-82f7-e3c8d5a9d96f/dqc_reference/reference_markers_gtdb.fasta -out GCA_020851285.1_ASM2085128v1_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-28 15:08:32,148] [INFO] Task succeeded: Blastn [2023-06-28 15:08:32,152] [INFO] Selected 26 target genomes. [2023-06-28 15:08:32,152] [INFO] Target genome list was writen to GCA_020851285.1_ASM2085128v1_genomic.fna/target_genomes_gtdb.txt [2023-06-28 15:08:32,186] [INFO] Task started: fastANI [2023-06-28 15:08:32,186] [INFO] Running command: fastANI --query /var/lib/cwl/stg7ab668e1-a5ba-4ca0-88d9-253647486d63/GCA_020851285.1_ASM2085128v1_genomic.fna.gz --refList GCA_020851285.1_ASM2085128v1_genomic.fna/target_genomes_gtdb.txt --output GCA_020851285.1_ASM2085128v1_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2023-06-28 15:08:48,839] [INFO] Task succeeded: fastANI [2023-06-28 15:08:48,853] [INFO] Found 18 fastANI hits (1 hits with ANI > circumscription radius) [2023-06-28 15:08:48,853] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCA_016712505.1 s__JADJQY01 sp016712505 99.0871 1563 1683 d__Bacteria;p__Eisenbacteria;c__RBG-16-71-46;o__CAIMUX01;f__CAIMUX01;g__JADJQY01 95.0 N/A N/A N/A N/A 1 conclusive GCA_016867495.1 s__VGJL01 sp016867495 76.1845 64 1683 d__Bacteria;p__Eisenbacteria;c__RBG-16-71-46;o__CAIMUX01;f__VGJL01;g__VGJL01 95.0 N/A N/A N/A N/A 1 - GCA_014729815.1 s__WJJG01 sp014729815 75.7602 68 1683 d__Bacteria;p__Eisenbacteria;c__RBG-16-71-46;o__CAIMUX01;f__WJJG01;g__WJJG01 95.0 N/A N/A N/A N/A 1 - GCA_016867695.1 s__VGIY01 sp016867695 75.6596 88 1683 d__Bacteria;p__Eisenbacteria;c__RBG-16-71-46;o__CAIMUX01;f__WJJG01;g__VGIY01 95.0 N/A N/A N/A N/A 1 - GCA_004298115.1 s__RBG-13-68-16 sp004298115 75.2456 78 1683 d__Bacteria;p__Acidobacteriota;c__Thermoanaerobaculia;o__Thermoanaerobaculales;f__Thermoanaerobaculaceae;g__RBG-13-68-16 95.0 N/A N/A N/A N/A 1 - GCA_011357805.1 s__DSQF01 sp011357805 75.1941 105 1683 d__Bacteria;p__Eisenbacteria;c__RBG-16-71-46;o__RBG-16-71-46;f__RBG-16-71-46;g__DSQF01 95.0 N/A N/A N/A N/A 1 - GCA_016215625.1 s__CAIXRL01 sp016215625 75.1315 96 1683 d__Bacteria;p__Eisenbacteria;c__RBG-16-71-46;o__RBG-16-71-46;f__RBG-16-71-46;g__CAIXRL01 95.0 N/A N/A N/A N/A 1 - GCA_015489145.1 s__S143-5 sp015489145 75.1241 77 1683 d__Bacteria;p__Myxococcota_A;c__UBA796;o__UBA796;f__UBA2385;g__S143-5 95.0 N/A N/A N/A N/A 1 - GCA_016869145.1 s__VGEU01 sp016869145 74.9754 63 1683 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__MPNO01;f__VGEU01;g__VGEU01 95.0 N/A N/A N/A N/A 1 - GCF_014652915.1 s__Thalassobaculum_A fulvum 74.9016 97 1683 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Thalassobaculales;f__Thalassobaculaceae;g__Thalassobaculum_A 95.0 N/A N/A N/A N/A 1 - GCF_000383795.1 s__Saccharomonospora saliphila 74.829 51 1683 d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Mycobacteriales;f__Pseudonocardiaceae;g__Saccharomonospora 95.0 N/A N/A N/A N/A 1 - GCA_001788415.1 s__UBA12499 sp001788415 74.7981 60 1683 d__Bacteria;p__Methylomirabilota;c__Methylomirabilia;o__Rokubacteriales;f__CSP1-6;g__UBA12499 95.0 N/A N/A N/A N/A 1 - GCA_016188005.1 s__UBA12499 sp016188005 74.7724 62 1683 d__Bacteria;p__Methylomirabilota;c__Methylomirabilia;o__Rokubacteriales;f__CSP1-6;g__UBA12499 95.0 N/A N/A N/A N/A 1 - GCF_900104805.1 s__Pseudomonas_K oryzae 74.7638 72 1683 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Pseudomonas_K 95.0 N/A N/A N/A N/A 1 - GCF_004795105.1 s__Streptomyces sp004795105 74.7264 90 1683 d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Streptomycetales;f__Streptomycetaceae;g__Streptomyces 95.0 N/A N/A N/A N/A 1 - GCF_001509485.1 s__Streptomyces sp001509485 74.7093 105 1683 d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Streptomycetales;f__Streptomycetaceae;g__Streptomyces 95.0 N/A N/A N/A N/A 1 - GCF_002934625.1 s__Kineococcus xinjiangensis 74.6894 65 1683 d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Actinomycetales;f__Kineococcaceae;g__Kineococcus 95.0 N/A N/A N/A N/A 1 - GCF_004522095.1 s__Nocardioides sp004522095 74.6026 65 1683 d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Propionibacteriales;f__Nocardioidaceae;g__Nocardioides 95.0 N/A N/A N/A N/A 1 - -------------------------------------------------------------------------------- [2023-06-28 15:08:48,868] [INFO] GTDB search result was written to GCA_020851285.1_ASM2085128v1_genomic.fna/result_gtdb.tsv [2023-06-28 15:08:48,869] [INFO] ===== GTDB Search completed ===== [2023-06-28 15:08:48,873] [INFO] DFAST_QC result json was written to GCA_020851285.1_ASM2085128v1_genomic.fna/dqc_result.json [2023-06-28 15:08:48,873] [INFO] DFAST_QC completed! [2023-06-28 15:08:48,874] [INFO] Total running time: 0h1m36s