[2023-06-13 00:45:23,060] [INFO] DFAST_QC pipeline started. [2023-06-13 00:45:23,062] [INFO] DFAST_QC version: 0.5.7 [2023-06-13 00:45:23,062] [INFO] DQC Reference Directory: /var/lib/cwl/stgb1d2ca0f-fb8f-4599-8574-62e2db71df32/dqc_reference [2023-06-13 00:45:24,331] [INFO] ===== Start taxonomy check using ANI ===== [2023-06-13 00:45:24,332] [INFO] Task started: Prodigal [2023-06-13 00:45:24,332] [INFO] Running command: gunzip -c /var/lib/cwl/stgd8426978-b6aa-4129-b4a4-a5dd0bb7403e/GCA_022735975.1_ASM2273597v1_genomic.fna.gz | prodigal -d GCA_022735975.1_ASM2273597v1_genomic.fna/cds.fna -a GCA_022735975.1_ASM2273597v1_genomic.fna/protein.faa -g 11 -q > /dev/null [2023-06-13 00:45:34,688] [INFO] Task succeeded: Prodigal [2023-06-13 00:45:34,689] [INFO] Task started: HMMsearch [2023-06-13 00:45:34,689] [INFO] Running command: hmmsearch --tblout GCA_022735975.1_ASM2273597v1_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stgb1d2ca0f-fb8f-4599-8574-62e2db71df32/dqc_reference/reference_markers.hmm GCA_022735975.1_ASM2273597v1_genomic.fna/protein.faa > /dev/null [2023-06-13 00:45:35,049] [INFO] Task succeeded: HMMsearch [2023-06-13 00:45:35,050] [INFO] Found 6/6 markers. [2023-06-13 00:45:35,096] [INFO] Query marker FASTA was written to GCA_022735975.1_ASM2273597v1_genomic.fna/markers.fasta [2023-06-13 00:45:35,096] [INFO] Task started: Blastn [2023-06-13 00:45:35,096] [INFO] Running command: blastn -query GCA_022735975.1_ASM2273597v1_genomic.fna/markers.fasta -db /var/lib/cwl/stgb1d2ca0f-fb8f-4599-8574-62e2db71df32/dqc_reference/reference_markers.fasta -out GCA_022735975.1_ASM2273597v1_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-13 00:45:36,184] [INFO] Task succeeded: Blastn [2023-06-13 00:45:36,188] [INFO] Selected 14 target genomes. [2023-06-13 00:45:36,189] [INFO] Target genome list was writen to GCA_022735975.1_ASM2273597v1_genomic.fna/target_genomes.txt [2023-06-13 00:45:36,193] [INFO] Task started: fastANI [2023-06-13 00:45:36,194] [INFO] Running command: fastANI --query /var/lib/cwl/stgd8426978-b6aa-4129-b4a4-a5dd0bb7403e/GCA_022735975.1_ASM2273597v1_genomic.fna.gz --refList GCA_022735975.1_ASM2273597v1_genomic.fna/target_genomes.txt --output GCA_022735975.1_ASM2273597v1_genomic.fna/fastani_result.tsv --threads 1 [2023-06-13 00:45:51,803] [INFO] Task succeeded: fastANI [2023-06-13 00:45:51,804] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stgb1d2ca0f-fb8f-4599-8574-62e2db71df32/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2023-06-13 00:45:51,804] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stgb1d2ca0f-fb8f-4599-8574-62e2db71df32/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2023-06-13 00:45:51,817] [INFO] Found 14 fastANI hits (0 hits with ANI > threshold) [2023-06-13 00:45:51,817] [INFO] The taxonomy check result is classified as 'below_threshold'. [2023-06-13 00:45:51,818] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Caulobacter segnis strain=TK0059 GCA_003015125.1 88688 88688 type True 85.8939 971 1347 95 below_threshold Caulobacter segnis strain=ATCC 21756 GCA_000092285.1 88688 88688 type True 85.8673 975 1347 95 below_threshold Caulobacter vibrioides strain=DSM 9893 GCA_002858865.1 155892 155892 type True 85.4298 887 1347 95 below_threshold Caulobacter zeae strain=410 GCA_002858925.1 2055137 2055137 type True 83.644 855 1347 95 below_threshold Caulobacter radicis strain=695 GCA_003094615.1 2172650 2172650 type True 83.5896 868 1347 95 below_threshold Caulobacter flavus strain=RHGG3 GCA_003722335.1 1679497 1679497 type True 83.564 884 1347 95 below_threshold Caulobacter flavus strain=CGMCC1 15093 GCA_002858845.1 1679497 1679497 type True 83.466 901 1347 95 below_threshold Caulobacter endophyticus strain=774 GCA_003116815.1 2172652 2172652 type True 83.4314 835 1347 95 below_threshold Caulobacter rhizosphaerae strain=KCTC 52515 GCA_010977555.1 2010972 2010972 type True 83.2097 896 1347 95 below_threshold Caulobacter rhizosphaerae strain=CGMCC 1.15915 GCA_014645055.1 2010972 2010972 type True 83.1753 889 1347 95 below_threshold Phenylobacterium aquaticum strain=KACC 18306 GCA_022695515.1 1763816 1763816 type True 79.3744 615 1347 95 below_threshold Phenylobacterium glaciei strain=20VBR1 GCA_016772415.2 2803784 2803784 type True 79.043 509 1347 95 below_threshold Brevundimonas vitisensis strain=GR-TSA-9 GCA_016656965.1 2800818 2800818 type True 77.6773 288 1347 95 below_threshold Starkeya koreensis strain=Jip08 GCA_023016525.1 266121 266121 type True 76.2347 209 1347 95 below_threshold -------------------------------------------------------------------------------- [2023-06-13 00:45:51,833] [INFO] DFAST Taxonomy check result was written to GCA_022735975.1_ASM2273597v1_genomic.fna/tc_result.tsv [2023-06-13 00:45:51,834] [INFO] ===== Taxonomy check completed ===== [2023-06-13 00:45:51,835] [INFO] ===== Start completeness check using CheckM ===== [2023-06-13 00:45:51,835] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stgb1d2ca0f-fb8f-4599-8574-62e2db71df32/dqc_reference/checkm_data [2023-06-13 00:45:51,837] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2023-06-13 00:45:51,884] [INFO] Task started: CheckM [2023-06-13 00:45:51,884] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCA_022735975.1_ASM2273597v1_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCA_022735975.1_ASM2273597v1_genomic.fna/checkm_input GCA_022735975.1_ASM2273597v1_genomic.fna/checkm_result [2023-06-13 00:46:25,181] [INFO] Task succeeded: CheckM [2023-06-13 00:46:25,182] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 91.67% Contamintation: 0.00% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2023-06-13 00:46:25,206] [INFO] ===== Completeness check finished ===== [2023-06-13 00:46:25,206] [INFO] ===== Start GTDB Search ===== [2023-06-13 00:46:25,206] [INFO] Query marker FASTA already exists. Will reuse it. (GCA_022735975.1_ASM2273597v1_genomic.fna/markers.fasta) [2023-06-13 00:46:25,207] [INFO] Task started: Blastn [2023-06-13 00:46:25,207] [INFO] Running command: blastn -query GCA_022735975.1_ASM2273597v1_genomic.fna/markers.fasta -db /var/lib/cwl/stgb1d2ca0f-fb8f-4599-8574-62e2db71df32/dqc_reference/reference_markers_gtdb.fasta -out GCA_022735975.1_ASM2273597v1_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-13 00:46:27,041] [INFO] Task succeeded: Blastn [2023-06-13 00:46:27,046] [INFO] Selected 14 target genomes. [2023-06-13 00:46:27,046] [INFO] Target genome list was writen to GCA_022735975.1_ASM2273597v1_genomic.fna/target_genomes_gtdb.txt [2023-06-13 00:46:27,059] [INFO] Task started: fastANI [2023-06-13 00:46:27,059] [INFO] Running command: fastANI --query /var/lib/cwl/stgd8426978-b6aa-4129-b4a4-a5dd0bb7403e/GCA_022735975.1_ASM2273597v1_genomic.fna.gz --refList GCA_022735975.1_ASM2273597v1_genomic.fna/target_genomes_gtdb.txt --output GCA_022735975.1_ASM2273597v1_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2023-06-13 00:46:42,738] [INFO] Task succeeded: fastANI [2023-06-13 00:46:42,762] [INFO] Found 14 fastANI hits (1 hits with ANI > circumscription radius) [2023-06-13 00:46:42,763] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCF_000372645.1 s__Caulobacter vibrioides_E 99.8333 1255 1347 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Caulobacteraceae;g__Caulobacter 95.0 98.89 97.90 0.89 0.86 5 conclusive GCA_903900155.1 s__Caulobacter sp903900155 86.6099 977 1347 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Caulobacteraceae;g__Caulobacter 95.0 99.93 99.90 0.95 0.93 3 - GCA_017744445.1 s__Caulobacter sp017744445 86.5291 992 1347 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Caulobacteraceae;g__Caulobacter 95.0 N/A N/A N/A N/A 1 - GCF_000799245.1 s__Caulobacter sp000799245 86.3769 1006 1347 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Caulobacteraceae;g__Caulobacter 95.0 96.44 96.44 0.92 0.92 3 - GCF_002742635.1 s__Caulobacter sp002742635 86.356 998 1347 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Caulobacteraceae;g__Caulobacter 95.0 N/A N/A N/A N/A 1 - GCF_003931565.1 s__Caulobacter sp003931565 86.338 1020 1347 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Caulobacteraceae;g__Caulobacter 95.0 N/A N/A N/A N/A 1 - GCF_001556515.1 s__Caulobacter sp001556515 86.178 873 1347 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Caulobacteraceae;g__Caulobacter 95.0 N/A N/A N/A N/A 1 - GCA_003243465.1 s__Caulobacter segnis_A 86.0915 910 1347 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Caulobacteraceae;g__Caulobacter 95.0 N/A N/A N/A N/A 1 - GCF_000092285.1 s__Caulobacter segnis 85.844 977 1347 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Caulobacteraceae;g__Caulobacter 95.0 100.00 100.00 1.00 1.00 2 - GCF_014207215.1 s__Caulobacter sp014207215 85.7967 972 1347 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Caulobacteraceae;g__Caulobacter 95.0 N/A N/A N/A N/A 1 - GCF_013181195.1 s__Caulobacter sp013181195 85.6374 959 1347 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Caulobacteraceae;g__Caulobacter 95.0 N/A N/A N/A N/A 1 - GCF_002858865.1 s__Caulobacter vibrioides 85.42 888 1347 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Caulobacteraceae;g__Caulobacter 95.0 98.80 98.06 0.96 0.95 6 - GCF_002310375.3 s__Caulobacter vibrioides_D 85.3153 905 1347 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Caulobacteraceae;g__Caulobacter 95.0 N/A N/A N/A N/A 1 - GCA_002280275.1 s__Caulobacter vibrioides_A 84.6925 663 1347 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Caulobacteraceae;g__Caulobacter 95.0 N/A N/A N/A N/A 1 - -------------------------------------------------------------------------------- [2023-06-13 00:46:42,765] [INFO] GTDB search result was written to GCA_022735975.1_ASM2273597v1_genomic.fna/result_gtdb.tsv [2023-06-13 00:46:42,765] [INFO] ===== GTDB Search completed ===== [2023-06-13 00:46:42,769] [INFO] DFAST_QC result json was written to GCA_022735975.1_ASM2273597v1_genomic.fna/dqc_result.json [2023-06-13 00:46:42,769] [INFO] DFAST_QC completed! [2023-06-13 00:46:42,769] [INFO] Total running time: 0h1m20s