[2024-01-24 11:05:34,608] [INFO] DFAST_QC pipeline started. [2024-01-24 11:05:34,610] [INFO] DFAST_QC version: 0.5.7 [2024-01-24 11:05:34,610] [INFO] DQC Reference Directory: /var/lib/cwl/stgbbff268d-89d1-4273-a0a6-24618c913845/dqc_reference [2024-01-24 11:05:35,789] [INFO] ===== Start taxonomy check using ANI ===== [2024-01-24 11:05:35,789] [INFO] Task started: Prodigal [2024-01-24 11:05:35,790] [INFO] Running command: gunzip -c /var/lib/cwl/stgd53628bd-ea9e-4228-a7d5-9666a34d42d1/GCF_004216635.1_ASM421663v1_genomic.fna.gz | prodigal -d GCF_004216635.1_ASM421663v1_genomic.fna/cds.fna -a GCF_004216635.1_ASM421663v1_genomic.fna/protein.faa -g 11 -q > /dev/null [2024-01-24 11:05:49,310] [INFO] Task succeeded: Prodigal [2024-01-24 11:05:49,311] [INFO] Task started: HMMsearch [2024-01-24 11:05:49,311] [INFO] Running command: hmmsearch --tblout GCF_004216635.1_ASM421663v1_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stgbbff268d-89d1-4273-a0a6-24618c913845/dqc_reference/reference_markers.hmm GCF_004216635.1_ASM421663v1_genomic.fna/protein.faa > /dev/null [2024-01-24 11:05:49,714] [INFO] Task succeeded: HMMsearch [2024-01-24 11:05:49,716] [INFO] Found 6/6 markers. [2024-01-24 11:05:49,757] [INFO] Query marker FASTA was written to GCF_004216635.1_ASM421663v1_genomic.fna/markers.fasta [2024-01-24 11:05:49,758] [INFO] Task started: Blastn [2024-01-24 11:05:49,758] [INFO] Running command: blastn -query GCF_004216635.1_ASM421663v1_genomic.fna/markers.fasta -db /var/lib/cwl/stgbbff268d-89d1-4273-a0a6-24618c913845/dqc_reference/reference_markers.fasta -out GCF_004216635.1_ASM421663v1_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-24 11:05:50,766] [INFO] Task succeeded: Blastn [2024-01-24 11:05:50,770] [INFO] Selected 18 target genomes. [2024-01-24 11:05:50,771] [INFO] Target genome list was writen to GCF_004216635.1_ASM421663v1_genomic.fna/target_genomes.txt [2024-01-24 11:05:50,794] [INFO] Task started: fastANI [2024-01-24 11:05:50,794] [INFO] Running command: fastANI --query /var/lib/cwl/stgd53628bd-ea9e-4228-a7d5-9666a34d42d1/GCF_004216635.1_ASM421663v1_genomic.fna.gz --refList GCF_004216635.1_ASM421663v1_genomic.fna/target_genomes.txt --output GCF_004216635.1_ASM421663v1_genomic.fna/fastani_result.tsv --threads 1 [2024-01-24 11:06:10,828] [INFO] Task succeeded: fastANI [2024-01-24 11:06:10,829] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stgbbff268d-89d1-4273-a0a6-24618c913845/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2024-01-24 11:06:10,829] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stgbbff268d-89d1-4273-a0a6-24618c913845/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2024-01-24 11:06:10,849] [INFO] Found 18 fastANI hits (3 hits with ANI > threshold) [2024-01-24 11:06:10,849] [INFO] The taxonomy check result is classified as 'conclusive'. [2024-01-24 11:06:10,849] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Variibacter gotjawalensis strain=DSM 29671 GCA_004216635.1 1333996 1333996 type True 100.0 1527 1527 95 conclusive Variibacter gotjawalensis strain=GJW-30 GCA_002355335.1 1333996 1333996 type True 99.9998 1526 1527 95 conclusive Variibacter gotjawalensis strain=CECT 8514 GCA_011761465.1 1333996 1333996 type True 99.9997 1527 1527 95 conclusive Bradyrhizobium diazoefficiens strain=USDA 110 GCA_001642675.1 1355477 1355477 type True 77.7564 385 1527 95 below_threshold Bradyrhizobium diazoefficiens strain=USDA110 GCA_000011365.1 1355477 1355477 type True 77.7427 381 1527 95 below_threshold Rhodopseudomonas pseudopalustris strain=DSM 123 GCA_900110435.1 1513892 1513892 type True 77.694 341 1527 95 below_threshold Bradyrhizobium viridifuturi strain=SEMIA 690 GCA_001238275.1 1654716 1654716 type True 77.5871 399 1527 95 below_threshold Rhodoplanes roseus strain=DSM 5909 GCA_003258865.1 29409 29409 type True 77.5687 349 1527 95 below_threshold Rhodoplanes elegans strain=DSM 11907 GCA_003258805.1 29408 29408 type True 77.5051 350 1527 95 below_threshold Bradyrhizobium cajani strain=1010 GCA_009759665.1 1928661 1928661 type True 77.4663 398 1527 95 below_threshold Bradyrhizobium sediminis strain=S2-20-1 GCA_018736085.1 2840469 2840469 type True 77.4586 318 1527 95 below_threshold Bradyrhizobium betae strain=CECT 5829 GCA_024806875.1 244734 244734 type True 77.4152 382 1527 95 below_threshold Bradyrhizobium embrapense strain=SEMIA 6208 GCA_001189235.2 630921 630921 type True 77.4109 420 1527 95 below_threshold Bradyrhizobium aeschynomenes strain=83002 GCA_013178945.1 2734909 2734909 type True 77.4035 378 1527 95 below_threshold Rhodoplanes piscinae strain=DSM 19946 GCA_003258855.1 444923 444923 type True 77.2685 307 1527 95 below_threshold Bradyrhizobium oropedii strain=Pear76 GCA_020889685.1 1571201 1571201 type True 77.2349 414 1527 95 below_threshold Azorhizobium caulinodans strain=ORS 571 GCA_000010525.1 7 7 type True 77.1625 197 1527 95 below_threshold Rhizobium rhizolycopersici strain=DBTS2 GCA_013378445.1 2746702 2746702 type True 76.8972 163 1527 95 below_threshold -------------------------------------------------------------------------------- [2024-01-24 11:06:10,851] [INFO] DFAST Taxonomy check result was written to GCF_004216635.1_ASM421663v1_genomic.fna/tc_result.tsv [2024-01-24 11:06:10,851] [INFO] ===== Taxonomy check completed ===== [2024-01-24 11:06:10,852] [INFO] ===== Start completeness check using CheckM ===== [2024-01-24 11:06:10,852] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stgbbff268d-89d1-4273-a0a6-24618c913845/dqc_reference/checkm_data [2024-01-24 11:06:10,853] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2024-01-24 11:06:10,896] [INFO] Task started: CheckM [2024-01-24 11:06:10,896] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCF_004216635.1_ASM421663v1_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCF_004216635.1_ASM421663v1_genomic.fna/checkm_input GCF_004216635.1_ASM421663v1_genomic.fna/checkm_result [2024-01-24 11:06:52,103] [INFO] Task succeeded: CheckM [2024-01-24 11:06:52,104] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 100.00% Contamintation: 0.00% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2024-01-24 11:06:52,127] [INFO] ===== Completeness check finished ===== [2024-01-24 11:06:52,128] [INFO] ===== Start GTDB Search ===== [2024-01-24 11:06:52,128] [INFO] Query marker FASTA already exists. Will reuse it. (GCF_004216635.1_ASM421663v1_genomic.fna/markers.fasta) [2024-01-24 11:06:52,129] [INFO] Task started: Blastn [2024-01-24 11:06:52,129] [INFO] Running command: blastn -query GCF_004216635.1_ASM421663v1_genomic.fna/markers.fasta -db /var/lib/cwl/stgbbff268d-89d1-4273-a0a6-24618c913845/dqc_reference/reference_markers_gtdb.fasta -out GCF_004216635.1_ASM421663v1_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-24 11:06:53,991] [INFO] Task succeeded: Blastn [2024-01-24 11:06:53,996] [INFO] Selected 25 target genomes. [2024-01-24 11:06:53,996] [INFO] Target genome list was writen to GCF_004216635.1_ASM421663v1_genomic.fna/target_genomes_gtdb.txt [2024-01-24 11:06:54,021] [INFO] Task started: fastANI [2024-01-24 11:06:54,021] [INFO] Running command: fastANI --query /var/lib/cwl/stgd53628bd-ea9e-4228-a7d5-9666a34d42d1/GCF_004216635.1_ASM421663v1_genomic.fna.gz --refList GCF_004216635.1_ASM421663v1_genomic.fna/target_genomes_gtdb.txt --output GCF_004216635.1_ASM421663v1_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2024-01-24 11:07:19,543] [INFO] Task succeeded: fastANI [2024-01-24 11:07:19,566] [INFO] Found 25 fastANI hits (1 hits with ANI > circumscription radius) [2024-01-24 11:07:19,566] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCF_002355335.1 s__Variibacter gotjawalensis 99.9998 1526 1527 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Xanthobacteraceae;g__Variibacter 95.0 100.00 100.00 1.00 1.00 3 conclusive GCA_018240595.1 s__Pseudolabrys sp018240595 78.043 367 1527 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Xanthobacteraceae;g__Pseudolabrys 95.0 N/A N/A N/A N/A 1 - GCA_017304555.1 s__JAFKKS01 sp017304555 77.9725 371 1527 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Xanthobacteraceae;g__JAFKKS01 95.0 N/A N/A N/A N/A 1 - GCA_018242205.1 s__Pseudolabrys sp018242205 77.8645 352 1527 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Xanthobacteraceae;g__Pseudolabrys 95.0 N/A N/A N/A N/A 1 - GCF_000011365.1 s__Bradyrhizobium diazoefficiens 77.7425 381 1527 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Xanthobacteraceae;g__Bradyrhizobium 95.0 99.15 98.13 0.93 0.84 23 - GCF_003367395.1 s__Pseudolabrys taiwanensis 77.7314 396 1527 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Xanthobacteraceae;g__Pseudolabrys 95.0 N/A N/A N/A N/A 1 - GCA_005884685.1 s__PALSA-894 sp005884685 77.7066 454 1527 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Xanthobacteraceae;g__PALSA-894 95.0 N/A N/A N/A N/A 1 - GCF_004571025.1 s__Bradyrhizobium niftali 77.6738 389 1527 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Xanthobacteraceae;g__Bradyrhizobium 95.0 96.00 95.49 0.80 0.80 3 - GCA_001899285.1 s__Pseudolabrys sp001899285 77.6612 363 1527 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Xanthobacteraceae;g__Pseudolabrys 95.0 99.99 99.99 1.00 1.00 2 - GCA_003105195.1 s__FEB-22 sp003105195 77.6572 331 1527 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Xanthobacteraceae;g__FEB-22 95.0 N/A N/A N/A N/A 1 - GCF_003258865.1 s__Rhodoplanes roseus 77.5947 346 1527 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Xanthobacteraceae;g__Rhodoplanes 95.0 N/A N/A N/A N/A 1 - GCF_004570865.1 s__Bradyrhizobium frederickii 77.5861 394 1527 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Xanthobacteraceae;g__Bradyrhizobium 95.0 97.35 97.32 0.89 0.89 3 - GCF_016653355.1 s__Rhodoplanes elegans 77.585 394 1527 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Xanthobacteraceae;g__Rhodoplanes 95.0 99.89 99.89 0.96 0.96 2 - GCF_018130695.1 s__Bradyrhizobium jicamae_B 77.5703 407 1527 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Xanthobacteraceae;g__Bradyrhizobium 95.0 N/A N/A N/A N/A 1 - GCF_011602485.1 s__Bradyrhizobium sp011602485 77.5377 391 1527 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Xanthobacteraceae;g__Bradyrhizobium 95.0 97.68 97.67 0.91 0.90 3 - GCA_019187085.1 s__Pseudorhodoplanes sp019187085 77.5182 336 1527 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Xanthobacteraceae;g__Pseudorhodoplanes 95.0 98.90 98.90 0.85 0.85 2 - GCF_003020115.1 s__Bradyrhizobium sp003020115 77.4785 361 1527 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Xanthobacteraceae;g__Bradyrhizobium 95.0 N/A N/A N/A N/A 1 - GCF_900114915.1 s__Bradyrhizobium sp900114915 77.4628 396 1527 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Xanthobacteraceae;g__Bradyrhizobium 95.0 97.76 97.31 0.89 0.88 8 - GCA_903885555.1 s__Pseudolabrys sp903885555 77.4264 297 1527 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Xanthobacteraceae;g__Pseudolabrys 95.0 99.81 99.81 0.96 0.96 2 - GCA_009694215.1 s__Z2-YC6860 sp009694215 77.4017 239 1527 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Xanthobacteraceae;g__Z2-YC6860 95.0 N/A N/A N/A N/A 1 - GCA_016185205.1 s__Pseudolabrys sp016185205 77.3759 324 1527 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Xanthobacteraceae;g__Pseudolabrys 95.0 N/A N/A N/A N/A 1 - GCA_001899255.1 s__62-47 sp001899255 77.3519 284 1527 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Xanthobacteraceae;g__62-47 95.0 100.00 100.00 1.00 1.00 2 - GCF_000472925.1 s__Bradyrhizobium sp000472925 77.2834 371 1527 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Xanthobacteraceae;g__Bradyrhizobium 95.0 N/A N/A N/A N/A 1 - GCA_003140315.1 s__PALSA-894 sp003140315 77.2262 414 1527 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Xanthobacteraceae;g__PALSA-894 95.0 N/A N/A N/A N/A 1 - GCA_016793595.1 s__Phreatobacter sp016793595 77.0176 290 1527 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Phreatobacteraceae;g__Phreatobacter 95.0 N/A N/A N/A N/A 1 - -------------------------------------------------------------------------------- [2024-01-24 11:07:19,568] [INFO] GTDB search result was written to GCF_004216635.1_ASM421663v1_genomic.fna/result_gtdb.tsv [2024-01-24 11:07:19,568] [INFO] ===== GTDB Search completed ===== [2024-01-24 11:07:19,573] [INFO] DFAST_QC result json was written to GCF_004216635.1_ASM421663v1_genomic.fna/dqc_result.json [2024-01-24 11:07:19,573] [INFO] DFAST_QC completed! [2024-01-24 11:07:19,573] [INFO] Total running time: 0h1m45s