[2024-01-24 13:37:40,814] [INFO] DFAST_QC pipeline started. [2024-01-24 13:37:40,816] [INFO] DFAST_QC version: 0.5.7 [2024-01-24 13:37:40,816] [INFO] DQC Reference Directory: /var/lib/cwl/stgadd01ef0-219e-4c06-a3f9-b1f6c1bcaf33/dqc_reference [2024-01-24 13:37:42,107] [INFO] ===== Start taxonomy check using ANI ===== [2024-01-24 13:37:42,108] [INFO] Task started: Prodigal [2024-01-24 13:37:42,108] [INFO] Running command: gunzip -c /var/lib/cwl/stge0eaf2bc-100a-43e7-aa16-86720510825a/GCF_014201825.1_ASM1420182v1_genomic.fna.gz | prodigal -d GCF_014201825.1_ASM1420182v1_genomic.fna/cds.fna -a GCF_014201825.1_ASM1420182v1_genomic.fna/protein.faa -g 11 -q > /dev/null [2024-01-24 13:37:51,866] [INFO] Task succeeded: Prodigal [2024-01-24 13:37:51,867] [INFO] Task started: HMMsearch [2024-01-24 13:37:51,867] [INFO] Running command: hmmsearch --tblout GCF_014201825.1_ASM1420182v1_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stgadd01ef0-219e-4c06-a3f9-b1f6c1bcaf33/dqc_reference/reference_markers.hmm GCF_014201825.1_ASM1420182v1_genomic.fna/protein.faa > /dev/null [2024-01-24 13:37:52,122] [INFO] Task succeeded: HMMsearch [2024-01-24 13:37:52,123] [INFO] Found 6/6 markers. [2024-01-24 13:37:52,156] [INFO] Query marker FASTA was written to GCF_014201825.1_ASM1420182v1_genomic.fna/markers.fasta [2024-01-24 13:37:52,156] [INFO] Task started: Blastn [2024-01-24 13:37:52,156] [INFO] Running command: blastn -query GCF_014201825.1_ASM1420182v1_genomic.fna/markers.fasta -db /var/lib/cwl/stgadd01ef0-219e-4c06-a3f9-b1f6c1bcaf33/dqc_reference/reference_markers.fasta -out GCF_014201825.1_ASM1420182v1_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-24 13:37:52,977] [INFO] Task succeeded: Blastn [2024-01-24 13:37:52,981] [INFO] Selected 17 target genomes. [2024-01-24 13:37:52,982] [INFO] Target genome list was writen to GCF_014201825.1_ASM1420182v1_genomic.fna/target_genomes.txt [2024-01-24 13:37:52,992] [INFO] Task started: fastANI [2024-01-24 13:37:52,992] [INFO] Running command: fastANI --query /var/lib/cwl/stge0eaf2bc-100a-43e7-aa16-86720510825a/GCF_014201825.1_ASM1420182v1_genomic.fna.gz --refList GCF_014201825.1_ASM1420182v1_genomic.fna/target_genomes.txt --output GCF_014201825.1_ASM1420182v1_genomic.fna/fastani_result.tsv --threads 1 [2024-01-24 13:38:05,885] [INFO] Task succeeded: fastANI [2024-01-24 13:38:05,886] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stgadd01ef0-219e-4c06-a3f9-b1f6c1bcaf33/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2024-01-24 13:38:05,886] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stgadd01ef0-219e-4c06-a3f9-b1f6c1bcaf33/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2024-01-24 13:38:05,905] [INFO] Found 17 fastANI hits (1 hits with ANI > threshold) [2024-01-24 13:38:05,906] [INFO] The taxonomy check result is classified as 'conclusive'. [2024-01-24 13:38:05,906] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Acidocella aromatica strain=DSM 27026 GCA_014201825.1 1303579 1303579 type True 100.0 944 944 95 conclusive Acidocella facilis strain=ATCC 35904 GCA_000687875.1 525 525 type True 80.3007 504 944 95 below_threshold Acidocella aminolytica strain=DSM 11237 GCA_900129125.1 33998 33998 type True 78.9548 375 944 95 below_threshold Acidiphilium multivorum strain=AIU301 GCA_000964345.1 62140 62140 type True 78.1923 259 944 95 below_threshold Acidiphilium multivorum strain=AIU301 GCA_000202835.1 62140 62140 type True 78.1585 284 944 95 below_threshold Roseomonas mucosa strain=ATCC BAA-692 GCA_000622225.1 207340 207340 type True 77.7399 240 944 95 below_threshold Roseomonas rhizosphaerae strain=YW11 GCA_002631185.1 1335062 1335062 type True 77.617 265 944 95 below_threshold Rhodovastum atsumiense strain=G2-11 GCA_937425535.1 504468 504468 type True 77.5685 270 944 95 below_threshold Rhodovarius lipocyclicus strain=CCUG 44693 GCA_009900765.1 268410 268410 type True 77.5179 222 944 95 below_threshold Roseomonas haemaphysalidis strain=546 GCA_017355405.1 2768162 2768162 type True 77.4961 237 944 95 below_threshold Roseomonas oryzae strain=KCTC 42542 GCA_008386565.1 1608942 1608942 type True 77.4735 225 944 95 below_threshold Roseomonas rubea strain=MO17 GCA_016106015.1 2748666 2748666 type True 77.4038 166 944 95 below_threshold Roseomonas alkaliterrae strain=DSM 25895 GCA_014199195.1 1452450 1452450 type True 77.3684 218 944 95 below_threshold Roseococcus microcysteis strain=NIBR12 GCA_014764365.1 2771361 2771361 type True 77.3517 224 944 95 below_threshold Roseococcus pinisoli strain=XZZS9 GCA_018413645.1 2835040 2835040 type True 77.2892 204 944 95 below_threshold Acidiphilium iwatense strain=KCTC 23505 GCA_021556475.1 768198 768198 type True 77.242 246 944 95 below_threshold Falsiroseomonas frigidaquae strain=JCM 15073 GCA_012163145.1 487318 487318 type True 77.1115 230 944 95 below_threshold -------------------------------------------------------------------------------- [2024-01-24 13:38:05,908] [INFO] DFAST Taxonomy check result was written to GCF_014201825.1_ASM1420182v1_genomic.fna/tc_result.tsv [2024-01-24 13:38:05,908] [INFO] ===== Taxonomy check completed ===== [2024-01-24 13:38:05,908] [INFO] ===== Start completeness check using CheckM ===== [2024-01-24 13:38:05,909] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stgadd01ef0-219e-4c06-a3f9-b1f6c1bcaf33/dqc_reference/checkm_data [2024-01-24 13:38:05,910] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2024-01-24 13:38:05,939] [INFO] Task started: CheckM [2024-01-24 13:38:05,940] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCF_014201825.1_ASM1420182v1_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCF_014201825.1_ASM1420182v1_genomic.fna/checkm_input GCF_014201825.1_ASM1420182v1_genomic.fna/checkm_result [2024-01-24 13:38:36,136] [INFO] Task succeeded: CheckM [2024-01-24 13:38:36,138] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 100.00% Contamintation: 0.00% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2024-01-24 13:38:36,159] [INFO] ===== Completeness check finished ===== [2024-01-24 13:38:36,159] [INFO] ===== Start GTDB Search ===== [2024-01-24 13:38:36,160] [INFO] Query marker FASTA already exists. Will reuse it. (GCF_014201825.1_ASM1420182v1_genomic.fna/markers.fasta) [2024-01-24 13:38:36,160] [INFO] Task started: Blastn [2024-01-24 13:38:36,161] [INFO] Running command: blastn -query GCF_014201825.1_ASM1420182v1_genomic.fna/markers.fasta -db /var/lib/cwl/stgadd01ef0-219e-4c06-a3f9-b1f6c1bcaf33/dqc_reference/reference_markers_gtdb.fasta -out GCF_014201825.1_ASM1420182v1_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-24 13:38:37,628] [INFO] Task succeeded: Blastn [2024-01-24 13:38:37,634] [INFO] Selected 11 target genomes. [2024-01-24 13:38:37,635] [INFO] Target genome list was writen to GCF_014201825.1_ASM1420182v1_genomic.fna/target_genomes_gtdb.txt [2024-01-24 13:38:37,660] [INFO] Task started: fastANI [2024-01-24 13:38:37,661] [INFO] Running command: fastANI --query /var/lib/cwl/stge0eaf2bc-100a-43e7-aa16-86720510825a/GCF_014201825.1_ASM1420182v1_genomic.fna.gz --refList GCF_014201825.1_ASM1420182v1_genomic.fna/target_genomes_gtdb.txt --output GCF_014201825.1_ASM1420182v1_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2024-01-24 13:38:44,597] [INFO] Task succeeded: fastANI [2024-01-24 13:38:44,617] [INFO] Found 11 fastANI hits (1 hits with ANI > circumscription radius) [2024-01-24 13:38:44,617] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCF_014201825.1 s__Acidocella aromatica 100.0 944 944 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__Acidocella 95.0 N/A N/A N/A N/A 1 conclusive GCF_000687875.1 s__Acidocella facilis 80.3215 503 944 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__Acidocella 95.0 98.02 98.02 0.93 0.93 2 - GCA_018971725.1 s__Acidocella sp018971725 79.7388 295 944 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__Acidocella 95.0 N/A N/A N/A N/A 1 - GCA_018971905.1 s__Acidocella sp018971905 79.287 243 944 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__Acidocella 95.0 N/A N/A N/A N/A 1 - GCA_018971565.1 s__Acidocella sp018971565 79.2217 300 944 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__Acidocella 95.0 N/A N/A N/A N/A 1 - GCF_900129125.1 s__Acidocella aminolytica 78.9546 375 944 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__Acidocella 95.0 99.99 99.99 0.99 0.99 2 - GCA_003164135.1 s__Acidocella sp003164135 78.8453 381 944 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__Acidocella 95.0 99.77 99.73 0.97 0.95 5 - GCA_013044125.1 s__Acidocella sp013044125 78.7939 367 944 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__Acidocella 95.0 N/A N/A N/A N/A 1 - GCA_003133265.1 s__PALSA-911 sp003133265 77.8062 175 944 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__PALSA-911 95.0 N/A N/A N/A N/A 1 - GCA_018240635.1 s__JAFEFJ01 sp018240635 77.4532 210 944 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__JAFEFJ01 95.0 N/A N/A N/A N/A 1 - GCA_903924855.1 s__CAIQQQ01 sp903924855 77.3961 208 944 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__CAIQQQ01 95.0 99.77 99.77 0.89 0.89 2 - -------------------------------------------------------------------------------- [2024-01-24 13:38:44,620] [INFO] GTDB search result was written to GCF_014201825.1_ASM1420182v1_genomic.fna/result_gtdb.tsv [2024-01-24 13:38:44,620] [INFO] ===== GTDB Search completed ===== [2024-01-24 13:38:44,624] [INFO] DFAST_QC result json was written to GCF_014201825.1_ASM1420182v1_genomic.fna/dqc_result.json [2024-01-24 13:38:44,624] [INFO] DFAST_QC completed! [2024-01-24 13:38:44,625] [INFO] Total running time: 0h1m4s