[2023-06-17 06:44:38,535] [INFO] DFAST_QC pipeline started. [2023-06-17 06:44:38,544] [INFO] DFAST_QC version: 0.5.7 [2023-06-17 06:44:38,544] [INFO] DQC Reference Directory: /var/lib/cwl/stg167ac4d0-0e12-46fa-9462-71c06b573cd6/dqc_reference [2023-06-17 06:44:40,116] [INFO] ===== Start taxonomy check using ANI ===== [2023-06-17 06:44:40,117] [INFO] Task started: Prodigal [2023-06-17 06:44:40,118] [INFO] Running command: gunzip -c /var/lib/cwl/stg81bd9589-799a-4cf9-ac7b-daf00f9edf10/GCA_905202645.1_ERR1430405-mag-bin.30_genomic.fna.gz | prodigal -d GCA_905202645.1_ERR1430405-mag-bin.30_genomic.fna/cds.fna -a GCA_905202645.1_ERR1430405-mag-bin.30_genomic.fna/protein.faa -g 11 -q > /dev/null [2023-06-17 06:44:46,959] [INFO] Task succeeded: Prodigal [2023-06-17 06:44:46,960] [INFO] Task started: HMMsearch [2023-06-17 06:44:46,960] [INFO] Running command: hmmsearch --tblout GCA_905202645.1_ERR1430405-mag-bin.30_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg167ac4d0-0e12-46fa-9462-71c06b573cd6/dqc_reference/reference_markers.hmm GCA_905202645.1_ERR1430405-mag-bin.30_genomic.fna/protein.faa > /dev/null [2023-06-17 06:44:47,220] [INFO] Task succeeded: HMMsearch [2023-06-17 06:44:47,222] [INFO] Found 6/6 markers. [2023-06-17 06:44:47,251] [INFO] Query marker FASTA was written to GCA_905202645.1_ERR1430405-mag-bin.30_genomic.fna/markers.fasta [2023-06-17 06:44:47,251] [INFO] Task started: Blastn [2023-06-17 06:44:47,251] [INFO] Running command: blastn -query GCA_905202645.1_ERR1430405-mag-bin.30_genomic.fna/markers.fasta -db /var/lib/cwl/stg167ac4d0-0e12-46fa-9462-71c06b573cd6/dqc_reference/reference_markers.fasta -out GCA_905202645.1_ERR1430405-mag-bin.30_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-17 06:44:47,919] [INFO] Task succeeded: Blastn [2023-06-17 06:44:47,924] [INFO] Selected 27 target genomes. [2023-06-17 06:44:47,925] [INFO] Target genome list was writen to GCA_905202645.1_ERR1430405-mag-bin.30_genomic.fna/target_genomes.txt [2023-06-17 06:44:47,930] [INFO] Task started: fastANI [2023-06-17 06:44:47,930] [INFO] Running command: fastANI --query /var/lib/cwl/stg81bd9589-799a-4cf9-ac7b-daf00f9edf10/GCA_905202645.1_ERR1430405-mag-bin.30_genomic.fna.gz --refList GCA_905202645.1_ERR1430405-mag-bin.30_genomic.fna/target_genomes.txt --output GCA_905202645.1_ERR1430405-mag-bin.30_genomic.fna/fastani_result.tsv --threads 1 [2023-06-17 06:45:04,898] [INFO] Task succeeded: fastANI [2023-06-17 06:45:04,899] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg167ac4d0-0e12-46fa-9462-71c06b573cd6/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2023-06-17 06:45:04,899] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg167ac4d0-0e12-46fa-9462-71c06b573cd6/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2023-06-17 06:45:04,915] [INFO] Found 16 fastANI hits (0 hits with ANI > threshold) [2023-06-17 06:45:04,916] [INFO] The taxonomy check result is classified as 'below_threshold'. [2023-06-17 06:45:04,916] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Clostridium fessum strain=SNUG30386 GCA_003024715.1 2126740 2126740 type True 78.4621 150 844 95 below_threshold Enterocloster asparagiformis strain=DSM 15981 GCA_025149125.1 333367 333367 type True 77.3168 148 844 95 below_threshold Enterocloster asparagiformis strain=DSM 15981 GCA_000158075.1 333367 333367 type True 77.2062 147 844 95 below_threshold Hungatella hathewayi strain=DSM 13479 GCA_000160095.1 154046 154046 suspected-type True 77.0775 120 844 95 below_threshold Hungatella hathewayi strain=DSM 13479 GCA_025149285.1 154046 154046 suspected-type True 77.0466 137 844 95 below_threshold Enterocloster bolteae strain=ATCC BAA-613 GCA_000154365.1 208479 208479 type True 77.0183 110 844 95 below_threshold Hungatella effluvii strain=DSM 24995 GCA_003201875.1 1096246 1096246 type True 76.9167 130 844 95 below_threshold Lachnoclostridium pacaense strain=Marseille-P3100 GCA_900566185.1 1917870 1917870 type True 76.8999 55 844 95 below_threshold Lacrimispora celerecrescens strain=18A GCA_002797975.1 29354 29354 type True 76.742 78 844 95 below_threshold Enterocloster clostridioformis strain=ATCC 25537 GCA_900113155.1 1531 1531 type True 76.7311 126 844 95 below_threshold Enterocloster clostridioformis strain=NCTC11224 GCA_900447015.1 1531 1531 suspected-type True 76.6963 118 844 95 below_threshold Enterocloster clostridioformis strain=FDAARGOS_1529 GCA_020297485.1 1531 1531 suspected-type True 76.6639 118 844 95 below_threshold Lacrimispora sphenoides strain=NCTC507 GCA_900461315.1 29370 29370 type True 76.5172 93 844 95 below_threshold Lacrimispora sphenoides strain=ATCC 19403 GCA_900105615.1 29370 29370 type True 76.4851 97 844 95 below_threshold Roseburia intestinalis strain=L1-82 GCA_000156535.1 166486 166486 type True 76.0141 59 844 95 below_threshold Eisenbergiella massiliensis strain=AT11 GCA_900243045.1 1720294 1720294 type True 75.9494 50 844 95 below_threshold -------------------------------------------------------------------------------- [2023-06-17 06:45:04,918] [INFO] DFAST Taxonomy check result was written to GCA_905202645.1_ERR1430405-mag-bin.30_genomic.fna/tc_result.tsv [2023-06-17 06:45:04,919] [INFO] ===== Taxonomy check completed ===== [2023-06-17 06:45:04,919] [INFO] ===== Start completeness check using CheckM ===== [2023-06-17 06:45:04,919] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg167ac4d0-0e12-46fa-9462-71c06b573cd6/dqc_reference/checkm_data [2023-06-17 06:45:04,921] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2023-06-17 06:45:04,955] [INFO] Task started: CheckM [2023-06-17 06:45:04,956] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCA_905202645.1_ERR1430405-mag-bin.30_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCA_905202645.1_ERR1430405-mag-bin.30_genomic.fna/checkm_input GCA_905202645.1_ERR1430405-mag-bin.30_genomic.fna/checkm_result [2023-06-17 06:45:31,345] [INFO] Task succeeded: CheckM [2023-06-17 06:45:31,346] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 75.38% Contamintation: 0.00% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2023-06-17 06:45:31,368] [INFO] ===== Completeness check finished ===== [2023-06-17 06:45:31,368] [INFO] ===== Start GTDB Search ===== [2023-06-17 06:45:31,369] [INFO] Query marker FASTA already exists. Will reuse it. (GCA_905202645.1_ERR1430405-mag-bin.30_genomic.fna/markers.fasta) [2023-06-17 06:45:31,369] [INFO] Task started: Blastn [2023-06-17 06:45:31,370] [INFO] Running command: blastn -query GCA_905202645.1_ERR1430405-mag-bin.30_genomic.fna/markers.fasta -db /var/lib/cwl/stg167ac4d0-0e12-46fa-9462-71c06b573cd6/dqc_reference/reference_markers_gtdb.fasta -out GCA_905202645.1_ERR1430405-mag-bin.30_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-17 06:45:32,501] [INFO] Task succeeded: Blastn [2023-06-17 06:45:32,507] [INFO] Selected 9 target genomes. [2023-06-17 06:45:32,507] [INFO] Target genome list was writen to GCA_905202645.1_ERR1430405-mag-bin.30_genomic.fna/target_genomes_gtdb.txt [2023-06-17 06:45:32,511] [INFO] Task started: fastANI [2023-06-17 06:45:32,512] [INFO] Running command: fastANI --query /var/lib/cwl/stg81bd9589-799a-4cf9-ac7b-daf00f9edf10/GCA_905202645.1_ERR1430405-mag-bin.30_genomic.fna.gz --refList GCA_905202645.1_ERR1430405-mag-bin.30_genomic.fna/target_genomes_gtdb.txt --output GCA_905202645.1_ERR1430405-mag-bin.30_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2023-06-17 06:45:38,735] [INFO] Task succeeded: fastANI [2023-06-17 06:45:38,750] [INFO] Found 9 fastANI hits (1 hits with ANI > circumscription radius) [2023-06-17 06:45:38,750] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCA_900770535.1 s__Ventrimonas sp900770535 98.2676 464 844 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Ventrimonas 95.0 98.72 98.72 0.86 0.86 2 conclusive GCF_003478505.1 s__Ventrimonas sp003478505 89.2584 638 844 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Ventrimonas 95.0 99.07 97.61 0.91 0.78 6 - GCF_003480315.1 s__Ventrimonas sp003480315 87.1541 609 844 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Ventrimonas 95.0 98.62 98.00 0.91 0.88 5 - GCF_003481985.1 s__Ventrimonas sp003506385 85.1842 622 844 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Ventrimonas 95.0 98.83 98.43 0.90 0.84 6 - GCF_003481825.1 s__Ventrimonas sp003481825 84.368 585 844 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Ventrimonas 95.0 99.33 98.65 0.93 0.87 3 - GCF_003024715.1 s__Clostridium_Q fessum 78.447 150 844 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Clostridium_Q 95.0 98.33 97.70 0.88 0.81 31 - GCA_900538475.1 s__Ventrimonas sp900538475 78.3358 186 844 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Ventrimonas 95.0 99.39 99.20 0.94 0.93 3 - GCA_900540335.1 s__Ventrimonas sp900540335 78.2119 185 844 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Ventrimonas 95.0 99.37 99.36 0.86 0.81 3 - GCA_910577765.1 s__Ventrimonas sp910577765 77.3397 146 844 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Ventrimonas 95.0 N/A N/A N/A N/A 1 - -------------------------------------------------------------------------------- [2023-06-17 06:45:38,752] [INFO] GTDB search result was written to GCA_905202645.1_ERR1430405-mag-bin.30_genomic.fna/result_gtdb.tsv [2023-06-17 06:45:38,753] [INFO] ===== GTDB Search completed ===== [2023-06-17 06:45:38,757] [INFO] DFAST_QC result json was written to GCA_905202645.1_ERR1430405-mag-bin.30_genomic.fna/dqc_result.json [2023-06-17 06:45:38,757] [INFO] DFAST_QC completed! [2023-06-17 06:45:38,757] [INFO] Total running time: 0h1m0s