[2023-06-13 17:55:51,706] [INFO] DFAST_QC pipeline started. [2023-06-13 17:55:51,708] [INFO] DFAST_QC version: 0.5.7 [2023-06-13 17:55:51,709] [INFO] DQC Reference Directory: /var/lib/cwl/stg5af6c315-a8f4-4d0b-a5c5-895f351d14c5/dqc_reference [2023-06-13 17:55:52,978] [INFO] ===== Start taxonomy check using ANI ===== [2023-06-13 17:55:52,979] [INFO] Task started: Prodigal [2023-06-13 17:55:52,979] [INFO] Running command: gunzip -c /var/lib/cwl/stg2cf5d5d9-86f5-4e2e-8180-83a43d403cf5/GCA_022563015.1_ASM2256301v1_genomic.fna.gz | prodigal -d GCA_022563015.1_ASM2256301v1_genomic.fna/cds.fna -a GCA_022563015.1_ASM2256301v1_genomic.fna/protein.faa -g 11 -q > /dev/null [2023-06-13 17:56:00,914] [INFO] Task succeeded: Prodigal [2023-06-13 17:56:00,915] [INFO] Task started: HMMsearch [2023-06-13 17:56:00,915] [INFO] Running command: hmmsearch --tblout GCA_022563015.1_ASM2256301v1_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg5af6c315-a8f4-4d0b-a5c5-895f351d14c5/dqc_reference/reference_markers.hmm GCA_022563015.1_ASM2256301v1_genomic.fna/protein.faa > /dev/null [2023-06-13 17:56:01,108] [INFO] Task succeeded: HMMsearch [2023-06-13 17:56:01,110] [INFO] Found 6/6 markers. [2023-06-13 17:56:01,138] [INFO] Query marker FASTA was written to GCA_022563015.1_ASM2256301v1_genomic.fna/markers.fasta [2023-06-13 17:56:01,138] [INFO] Task started: Blastn [2023-06-13 17:56:01,138] [INFO] Running command: blastn -query GCA_022563015.1_ASM2256301v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg5af6c315-a8f4-4d0b-a5c5-895f351d14c5/dqc_reference/reference_markers.fasta -out GCA_022563015.1_ASM2256301v1_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-13 17:56:01,749] [INFO] Task succeeded: Blastn [2023-06-13 17:56:01,756] [INFO] Selected 25 target genomes. [2023-06-13 17:56:01,757] [INFO] Target genome list was writen to GCA_022563015.1_ASM2256301v1_genomic.fna/target_genomes.txt [2023-06-13 17:56:01,765] [INFO] Task started: fastANI [2023-06-13 17:56:01,765] [INFO] Running command: fastANI --query /var/lib/cwl/stg2cf5d5d9-86f5-4e2e-8180-83a43d403cf5/GCA_022563015.1_ASM2256301v1_genomic.fna.gz --refList GCA_022563015.1_ASM2256301v1_genomic.fna/target_genomes.txt --output GCA_022563015.1_ASM2256301v1_genomic.fna/fastani_result.tsv --threads 1 [2023-06-13 17:56:20,272] [INFO] Task succeeded: fastANI [2023-06-13 17:56:20,272] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg5af6c315-a8f4-4d0b-a5c5-895f351d14c5/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2023-06-13 17:56:20,273] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg5af6c315-a8f4-4d0b-a5c5-895f351d14c5/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2023-06-13 17:56:20,281] [INFO] Found 8 fastANI hits (0 hits with ANI > threshold) [2023-06-13 17:56:20,281] [INFO] The taxonomy check result is classified as 'below_threshold'. [2023-06-13 17:56:20,281] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Mucisphaera calidilacus strain=Pan265 GCA_007748075.1 2527982 2527982 type True 76.1311 50 794 95 below_threshold Phycisphaera mikurensis strain=DSM 103959 GCA_014207395.1 547188 547188 type True 75.738 72 794 95 below_threshold Phycisphaera mikurensis strain=NBRC 102666 GCA_000284115.1 547188 547188 type True 75.7329 72 794 95 below_threshold Paludisphaera soli strain=JC670 GCA_011064595.1 2712865 2712865 type True 75.2146 52 794 95 below_threshold Streptacidiphilus carbonis strain=NBRC 100919 GCA_000787775.1 105422 105422 type True 74.7963 60 794 95 below_threshold Streptomyces sabulosicollis strain=PRKS01-29 GCA_016103465.1 2715963 2715963 type True 74.6669 50 794 95 below_threshold Streptacidiphilus albus strain=JL83 GCA_000744705.1 105425 105425 type True 74.589 70 794 95 below_threshold Streptacidiphilus albus strain=NBRC 100918 GCA_000787755.1 105425 105425 type True 74.5873 72 794 95 below_threshold -------------------------------------------------------------------------------- [2023-06-13 17:56:20,283] [INFO] DFAST Taxonomy check result was written to GCA_022563015.1_ASM2256301v1_genomic.fna/tc_result.tsv [2023-06-13 17:56:20,285] [INFO] ===== Taxonomy check completed ===== [2023-06-13 17:56:20,286] [INFO] ===== Start completeness check using CheckM ===== [2023-06-13 17:56:20,286] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg5af6c315-a8f4-4d0b-a5c5-895f351d14c5/dqc_reference/checkm_data [2023-06-13 17:56:20,287] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2023-06-13 17:56:20,320] [INFO] Task started: CheckM [2023-06-13 17:56:20,321] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCA_022563015.1_ASM2256301v1_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCA_022563015.1_ASM2256301v1_genomic.fna/checkm_input GCA_022563015.1_ASM2256301v1_genomic.fna/checkm_result [2023-06-13 17:56:47,921] [INFO] Task succeeded: CheckM [2023-06-13 17:56:47,922] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 95.83% Contamintation: 0.00% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2023-06-13 17:56:47,941] [INFO] ===== Completeness check finished ===== [2023-06-13 17:56:47,941] [INFO] ===== Start GTDB Search ===== [2023-06-13 17:56:47,941] [INFO] Query marker FASTA already exists. Will reuse it. (GCA_022563015.1_ASM2256301v1_genomic.fna/markers.fasta) [2023-06-13 17:56:47,941] [INFO] Task started: Blastn [2023-06-13 17:56:47,942] [INFO] Running command: blastn -query GCA_022563015.1_ASM2256301v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg5af6c315-a8f4-4d0b-a5c5-895f351d14c5/dqc_reference/reference_markers_gtdb.fasta -out GCA_022563015.1_ASM2256301v1_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-13 17:56:48,802] [INFO] Task succeeded: Blastn [2023-06-13 17:56:48,806] [INFO] Selected 20 target genomes. [2023-06-13 17:56:48,807] [INFO] Target genome list was writen to GCA_022563015.1_ASM2256301v1_genomic.fna/target_genomes_gtdb.txt [2023-06-13 17:56:48,818] [INFO] Task started: fastANI [2023-06-13 17:56:48,819] [INFO] Running command: fastANI --query /var/lib/cwl/stg2cf5d5d9-86f5-4e2e-8180-83a43d403cf5/GCA_022563015.1_ASM2256301v1_genomic.fna.gz --refList GCA_022563015.1_ASM2256301v1_genomic.fna/target_genomes_gtdb.txt --output GCA_022563015.1_ASM2256301v1_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2023-06-13 17:57:00,122] [INFO] Task succeeded: fastANI [2023-06-13 17:57:00,137] [INFO] Found 17 fastANI hits (0 hits with ANI > circumscription radius) [2023-06-13 17:57:00,138] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCA_014584315.1 s__CAADGN01 sp014584315 77.1179 140 794 d__Bacteria;p__Planctomycetota;c__Phycisphaerae;o__Phycisphaerales;f__UBA1924;g__CAADGN01 95.0 N/A N/A N/A N/A 1 - GCA_900696545.1 s__CAADGN01 sp900696545 77.0151 124 794 d__Bacteria;p__Planctomycetota;c__Phycisphaerae;o__Phycisphaerales;f__UBA1924;g__CAADGN01 95.0 99.87 99.83 0.94 0.92 3 - GCA_008363215.1 s__J022 sp008363215 76.956 94 794 d__Bacteria;p__Planctomycetota;c__Phycisphaerae;o__Phycisphaerales;f__UBA1924;g__J022 95.0 99.97 99.97 0.97 0.97 2 - GCA_007124345.1 s__SLFH01 sp007124345 76.9515 95 794 d__Bacteria;p__Planctomycetota;c__Phycisphaerae;o__Phycisphaerales;f__UBA1924;g__SLFH01 95.0 98.92 98.92 0.84 0.84 2 - GCA_013360675.1 s__JABWBB01 sp013360675 76.7045 99 794 d__Bacteria;p__Planctomycetota;c__Phycisphaerae;o__Phycisphaerales;f__UBA1924;g__JABWBB01 95.0 95.57 95.57 0.94 0.94 2 - GCA_002429045.1 s__UBA6054 sp002429045 76.6747 102 794 d__Bacteria;p__Planctomycetota;c__Phycisphaerae;o__Phycisphaerales;f__UBA1924;g__UBA6054 95.0 N/A N/A N/A N/A 1 - GCA_016742075.1 s__JACVCL01 sp016742075 76.6618 87 794 d__Bacteria;p__Planctomycetota;c__Phycisphaerae;o__Phycisphaerales;f__UBA1924;g__JACVCL01 95.0 N/A N/A N/A N/A 1 - GCA_016793195.1 s__JABWBC01 sp016793195 76.6489 110 794 d__Bacteria;p__Planctomycetota;c__Phycisphaerae;o__Phycisphaerales;f__UBA1924;g__JABWBC01 95.0 N/A N/A N/A N/A 1 - GCA_007693785.1 s__RECY01 sp007693785 76.6266 123 794 d__Bacteria;p__Planctomycetota;c__Phycisphaerae;o__Phycisphaerales;f__UBA1924;g__RECY01 95.0 N/A N/A N/A N/A 1 - GCA_003696675.1 s__UBA6054 sp003696675 76.6177 90 794 d__Bacteria;p__Planctomycetota;c__Phycisphaerae;o__Phycisphaerales;f__UBA1924;g__UBA6054 95.0 N/A N/A N/A N/A 1 - GCA_013360695.1 s__JABWBC01 sp013360695 76.6162 118 794 d__Bacteria;p__Planctomycetota;c__Phycisphaerae;o__Phycisphaerales;f__UBA1924;g__JABWBC01 95.0 N/A N/A N/A N/A 1 - GCA_007694925.1 s__SLFH01 sp007694925 76.5922 113 794 d__Bacteria;p__Planctomycetota;c__Phycisphaerae;o__Phycisphaerales;f__UBA1924;g__SLFH01 95.0 N/A N/A N/A N/A 1 - GCA_013285505.1 s__PNC22 sp013285505 76.5798 141 794 d__Bacteria;p__Planctomycetota;c__Phycisphaerae;o__Phycisphaerales;f__UBA1924;g__PNC22 95.0 N/A N/A N/A N/A 1 - GCA_016709445.1 s__PNC22 sp016709445 76.4454 120 794 d__Bacteria;p__Planctomycetota;c__Phycisphaerae;o__Phycisphaerales;f__UBA1924;g__PNC22 95.0 N/A N/A N/A N/A 1 - GCA_016794865.1 s__JAEUJB01 sp016794865 76.3762 78 794 d__Bacteria;p__Planctomycetota;c__Phycisphaerae;o__Phycisphaerales;f__UBA1924;g__JAEUJB01 95.0 N/A N/A N/A N/A 1 - GCA_007693765.1 s__RECZ01 sp007693765 76.373 109 794 d__Bacteria;p__Planctomycetota;c__Phycisphaerae;o__Phycisphaerales;f__UBA1924;g__RECZ01 95.0 N/A N/A N/A N/A 1 - GCA_016794945.1 s__JAEUIX01 sp016794945 76.2151 76 794 d__Bacteria;p__Planctomycetota;c__Phycisphaerae;o__Phycisphaerales;f__UBA1924;g__JAEUIX01 95.0 N/A N/A N/A N/A 1 - -------------------------------------------------------------------------------- [2023-06-13 17:57:00,142] [INFO] GTDB search result was written to GCA_022563015.1_ASM2256301v1_genomic.fna/result_gtdb.tsv [2023-06-13 17:57:00,143] [INFO] ===== GTDB Search completed ===== [2023-06-13 17:57:00,148] [INFO] DFAST_QC result json was written to GCA_022563015.1_ASM2256301v1_genomic.fna/dqc_result.json [2023-06-13 17:57:00,148] [INFO] DFAST_QC completed! [2023-06-13 17:57:00,148] [INFO] Total running time: 0h1m8s