[2024-01-24 14:12:03,455] [INFO] DFAST_QC pipeline started. [2024-01-24 14:12:03,457] [INFO] DFAST_QC version: 0.5.7 [2024-01-24 14:12:03,458] [INFO] DQC Reference Directory: /var/lib/cwl/stg20e0e3b5-8b80-443c-af61-c0a5a31c3ca3/dqc_reference [2024-01-24 14:12:04,758] [INFO] ===== Start taxonomy check using ANI ===== [2024-01-24 14:12:04,759] [INFO] Task started: Prodigal [2024-01-24 14:12:04,759] [INFO] Running command: gunzip -c /var/lib/cwl/stgefc2e3b9-f7cc-4b79-ace8-2e1ebc4dc3ec/GCF_002196895.1_ASM219689v1_genomic.fna.gz | prodigal -d GCF_002196895.1_ASM219689v1_genomic.fna/cds.fna -a GCF_002196895.1_ASM219689v1_genomic.fna/protein.faa -g 11 -q > /dev/null [2024-01-24 14:12:17,581] [INFO] Task succeeded: Prodigal [2024-01-24 14:12:17,582] [INFO] Task started: HMMsearch [2024-01-24 14:12:17,582] [INFO] Running command: hmmsearch --tblout GCF_002196895.1_ASM219689v1_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg20e0e3b5-8b80-443c-af61-c0a5a31c3ca3/dqc_reference/reference_markers.hmm GCF_002196895.1_ASM219689v1_genomic.fna/protein.faa > /dev/null [2024-01-24 14:12:17,935] [INFO] Task succeeded: HMMsearch [2024-01-24 14:12:17,937] [INFO] Found 6/6 markers. [2024-01-24 14:12:17,989] [INFO] Query marker FASTA was written to GCF_002196895.1_ASM219689v1_genomic.fna/markers.fasta [2024-01-24 14:12:17,989] [INFO] Task started: Blastn [2024-01-24 14:12:17,989] [INFO] Running command: blastn -query GCF_002196895.1_ASM219689v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg20e0e3b5-8b80-443c-af61-c0a5a31c3ca3/dqc_reference/reference_markers.fasta -out GCF_002196895.1_ASM219689v1_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-24 14:12:19,081] [INFO] Task succeeded: Blastn [2024-01-24 14:12:19,084] [INFO] Selected 17 target genomes. [2024-01-24 14:12:19,085] [INFO] Target genome list was writen to GCF_002196895.1_ASM219689v1_genomic.fna/target_genomes.txt [2024-01-24 14:12:19,092] [INFO] Task started: fastANI [2024-01-24 14:12:19,092] [INFO] Running command: fastANI --query /var/lib/cwl/stgefc2e3b9-f7cc-4b79-ace8-2e1ebc4dc3ec/GCF_002196895.1_ASM219689v1_genomic.fna.gz --refList GCF_002196895.1_ASM219689v1_genomic.fna/target_genomes.txt --output GCF_002196895.1_ASM219689v1_genomic.fna/fastani_result.tsv --threads 1 [2024-01-24 14:12:32,141] [INFO] Task succeeded: fastANI [2024-01-24 14:12:32,142] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg20e0e3b5-8b80-443c-af61-c0a5a31c3ca3/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2024-01-24 14:12:32,142] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg20e0e3b5-8b80-443c-af61-c0a5a31c3ca3/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2024-01-24 14:12:32,161] [INFO] Found 17 fastANI hits (2 hits with ANI > threshold) [2024-01-24 14:12:32,161] [INFO] The taxonomy check result is classified as 'conclusive'. [2024-01-24 14:12:32,161] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Haematobacter missouriensis strain=H1892 GCA_002196895.1 366616 366616 type True 100.0 1373 1382 95 conclusive Haematobacter missouriensis strain=CCUG 52307 GCA_000740775.1 366616 366616 type True 99.9596 1235 1382 95 conclusive Haematobacter massiliensis strain=CCUG 47968 GCA_000740795.1 195105 195105 type True 88.4085 1062 1382 95 below_threshold Cereibacter sediminicola strain=JA983 GCA_007668225.1 2584941 2584941 type True 78.4839 323 1382 95 below_threshold Thioclava atlantica strain=13D2W-2 GCA_000737065.1 1317124 1317124 type True 78.1789 285 1382 95 below_threshold Cereibacter azotoformans strain=KA25 GCA_003050905.1 43057 43057 type True 78.09 303 1382 95 below_threshold Cereibacter ovatus strain=JA234 GCA_900207575.1 439529 439529 type True 78.0659 282 1382 95 below_threshold Cereibacter johrii strain=JA192 GCA_003046325.1 445629 445629 type True 78.0349 343 1382 95 below_threshold Cereibacter johrii strain=JA192 GCA_001720585.1 445629 445629 type True 78.0018 326 1382 95 below_threshold Rhodovulum visakhapatnamense strain=JA181 GCA_004365965.1 364297 364297 type True 77.966 291 1382 95 below_threshold Sinirhodobacter huangdaonensis strain=CGMCC 1.12963 GCA_004022465.1 2501515 2501515 type True 77.7148 303 1382 95 below_threshold Rhodobacter amnigenus strain=HSP-20 GCA_009908265.2 2852097 2852097 type True 77.6458 274 1382 95 below_threshold Paracoccus isoporae strain=DSM 22220 GCA_900101865.1 591205 591205 type True 77.5443 248 1382 95 below_threshold Rhodovulum tesquicola strain=A-36s GCA_024128855.1 540254 540254 type True 77.4698 279 1382 95 below_threshold Antarcticimicrobium luteum strain=318-1 GCA_004358185.1 2547397 2547397 type True 77.4304 248 1382 95 below_threshold Rhodovulum sulfidophilum strain=DSM 1374 GCA_001633165.1 35806 35806 type True 77.3559 263 1382 95 below_threshold Chachezhania sediminis strain=CAU 1508 GCA_009765275.1 2599291 2599291 type True 77.1714 226 1382 95 below_threshold -------------------------------------------------------------------------------- [2024-01-24 14:12:32,163] [INFO] DFAST Taxonomy check result was written to GCF_002196895.1_ASM219689v1_genomic.fna/tc_result.tsv [2024-01-24 14:12:32,163] [INFO] ===== Taxonomy check completed ===== [2024-01-24 14:12:32,163] [INFO] ===== Start completeness check using CheckM ===== [2024-01-24 14:12:32,163] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg20e0e3b5-8b80-443c-af61-c0a5a31c3ca3/dqc_reference/checkm_data [2024-01-24 14:12:32,164] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2024-01-24 14:12:32,205] [INFO] Task started: CheckM [2024-01-24 14:12:32,205] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCF_002196895.1_ASM219689v1_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCF_002196895.1_ASM219689v1_genomic.fna/checkm_input GCF_002196895.1_ASM219689v1_genomic.fna/checkm_result [2024-01-24 14:13:12,176] [INFO] Task succeeded: CheckM [2024-01-24 14:13:12,177] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 100.00% Contamintation: 0.00% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2024-01-24 14:13:12,195] [INFO] ===== Completeness check finished ===== [2024-01-24 14:13:12,196] [INFO] ===== Start GTDB Search ===== [2024-01-24 14:13:12,196] [INFO] Query marker FASTA already exists. Will reuse it. (GCF_002196895.1_ASM219689v1_genomic.fna/markers.fasta) [2024-01-24 14:13:12,197] [INFO] Task started: Blastn [2024-01-24 14:13:12,197] [INFO] Running command: blastn -query GCF_002196895.1_ASM219689v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg20e0e3b5-8b80-443c-af61-c0a5a31c3ca3/dqc_reference/reference_markers_gtdb.fasta -out GCF_002196895.1_ASM219689v1_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-24 14:13:13,977] [INFO] Task succeeded: Blastn [2024-01-24 14:13:13,980] [INFO] Selected 16 target genomes. [2024-01-24 14:13:13,980] [INFO] Target genome list was writen to GCF_002196895.1_ASM219689v1_genomic.fna/target_genomes_gtdb.txt [2024-01-24 14:13:14,002] [INFO] Task started: fastANI [2024-01-24 14:13:14,003] [INFO] Running command: fastANI --query /var/lib/cwl/stgefc2e3b9-f7cc-4b79-ace8-2e1ebc4dc3ec/GCF_002196895.1_ASM219689v1_genomic.fna.gz --refList GCF_002196895.1_ASM219689v1_genomic.fna/target_genomes_gtdb.txt --output GCF_002196895.1_ASM219689v1_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2024-01-24 14:13:25,275] [INFO] Task succeeded: fastANI [2024-01-24 14:13:25,292] [INFO] Found 16 fastANI hits (1 hits with ANI > circumscription radius) [2024-01-24 14:13:25,292] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCF_000740775.1 s__Haematobacter missouriensis 99.9596 1235 1382 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Haematobacter 95.0 99.82 99.68 0.96 0.95 3 conclusive GCF_002196855.1 s__Haematobacter sp002196855 93.0584 1137 1382 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Haematobacter 95.0 98.61 98.61 0.86 0.86 2 - GCF_000740795.1 s__Haematobacter massiliensis 88.4087 1063 1382 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Haematobacter 95.0 99.83 99.63 0.96 0.93 4 - GCF_001620265.1 s__Frigidibacter mobilis 78.4532 376 1382 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Frigidibacter 95.0 N/A N/A N/A N/A 1 - GCA_017599285.1 s__Cereibacter_A azotoformans 78.2005 263 1382 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Cereibacter_A 95.0 98.54 95.54 0.92 0.86 6 - GCF_003993775.1 s__Frigidibacter sp003993775 78.1777 362 1382 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Frigidibacter 95.0 N/A N/A N/A N/A 1 - GCF_002407205.1 s__Cereibacter_A sp002407205 78.0159 348 1382 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Cereibacter_A 95.0 N/A N/A N/A N/A 1 - GCA_002423245.1 s__Frigidibacter sp002423245 77.996 306 1382 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Frigidibacter 95.0 98.55 98.55 0.85 0.85 2 - GCA_008933605.1 s__Albidovulum sp008933605 77.7848 240 1382 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Albidovulum 95.0 N/A N/A N/A N/A 1 - GCF_002871005.1 s__Acidimangrovimonas sediminis 77.595 299 1382 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Acidimangrovimonas 95.0 N/A N/A N/A N/A 1 - GCF_900101865.1 s__Paracoccus isoporae 77.5443 248 1382 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Paracoccus 95.0 N/A N/A N/A N/A 1 - GCA_011620265.1 s__Albidovulum sp011620265 77.5248 287 1382 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Albidovulum 95.0 N/A N/A N/A N/A 1 - GCA_002280515.1 s__Albidovulum sp002280515 77.5126 268 1382 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Albidovulum 95.0 N/A N/A N/A N/A 1 - GCA_003255745.1 s__Paracoccus saliphilus_A 77.4352 199 1382 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Paracoccus 95.0 N/A N/A N/A N/A 1 - GCA_003249215.1 s__SZUA-611 sp003249215 77.268 254 1382 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__SZUA-611 95.0 N/A N/A N/A N/A 1 - GCA_003551745.1 s__Rhodobaculum sp003551745 76.8858 152 1382 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Rhodobaculum 95.0 N/A N/A N/A N/A 1 - -------------------------------------------------------------------------------- [2024-01-24 14:13:25,294] [INFO] GTDB search result was written to GCF_002196895.1_ASM219689v1_genomic.fna/result_gtdb.tsv [2024-01-24 14:13:25,295] [INFO] ===== GTDB Search completed ===== [2024-01-24 14:13:25,299] [INFO] DFAST_QC result json was written to GCF_002196895.1_ASM219689v1_genomic.fna/dqc_result.json [2024-01-24 14:13:25,299] [INFO] DFAST_QC completed! [2024-01-24 14:13:25,299] [INFO] Total running time: 0h1m22s