[2023-06-28 20:40:52,058] [INFO] DFAST_QC pipeline started. [2023-06-28 20:40:52,061] [INFO] DFAST_QC version: 0.5.7 [2023-06-28 20:40:52,061] [INFO] DQC Reference Directory: /var/lib/cwl/stgf97967a0-857d-49e4-b28c-664050765279/dqc_reference [2023-06-28 20:40:54,146] [INFO] ===== Start taxonomy check using ANI ===== [2023-06-28 20:40:54,147] [INFO] Task started: Prodigal [2023-06-28 20:40:54,147] [INFO] Running command: gunzip -c /var/lib/cwl/stg0e292d0c-8588-48af-836a-0919bbdf988a/GCA_020832515.1_ASM2083251v1_genomic.fna.gz | prodigal -d GCA_020832515.1_ASM2083251v1_genomic.fna/cds.fna -a GCA_020832515.1_ASM2083251v1_genomic.fna/protein.faa -g 11 -q > /dev/null [2023-06-28 20:41:05,265] [INFO] Task succeeded: Prodigal [2023-06-28 20:41:05,265] [INFO] Task started: HMMsearch [2023-06-28 20:41:05,265] [INFO] Running command: hmmsearch --tblout GCA_020832515.1_ASM2083251v1_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stgf97967a0-857d-49e4-b28c-664050765279/dqc_reference/reference_markers.hmm GCA_020832515.1_ASM2083251v1_genomic.fna/protein.faa > /dev/null [2023-06-28 20:41:05,550] [INFO] Task succeeded: HMMsearch [2023-06-28 20:41:05,552] [WARNING] Found 5/6 markers. [/var/lib/cwl/stg0e292d0c-8588-48af-836a-0919bbdf988a/GCA_020832515.1_ASM2083251v1_genomic.fna.gz] [2023-06-28 20:41:05,595] [INFO] Query marker FASTA was written to GCA_020832515.1_ASM2083251v1_genomic.fna/markers.fasta [2023-06-28 20:41:05,595] [INFO] Task started: Blastn [2023-06-28 20:41:05,596] [INFO] Running command: blastn -query GCA_020832515.1_ASM2083251v1_genomic.fna/markers.fasta -db /var/lib/cwl/stgf97967a0-857d-49e4-b28c-664050765279/dqc_reference/reference_markers.fasta -out GCA_020832515.1_ASM2083251v1_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-28 20:41:06,426] [INFO] Task succeeded: Blastn [2023-06-28 20:41:06,432] [INFO] Selected 24 target genomes. [2023-06-28 20:41:06,432] [INFO] Target genome list was writen to GCA_020832515.1_ASM2083251v1_genomic.fna/target_genomes.txt [2023-06-28 20:41:06,434] [INFO] Task started: fastANI [2023-06-28 20:41:06,434] [INFO] Running command: fastANI --query /var/lib/cwl/stg0e292d0c-8588-48af-836a-0919bbdf988a/GCA_020832515.1_ASM2083251v1_genomic.fna.gz --refList GCA_020832515.1_ASM2083251v1_genomic.fna/target_genomes.txt --output GCA_020832515.1_ASM2083251v1_genomic.fna/fastani_result.tsv --threads 1 [2023-06-28 20:41:24,904] [INFO] Task succeeded: fastANI [2023-06-28 20:41:24,904] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stgf97967a0-857d-49e4-b28c-664050765279/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2023-06-28 20:41:24,905] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stgf97967a0-857d-49e4-b28c-664050765279/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2023-06-28 20:41:24,925] [INFO] Found 24 fastANI hits (0 hits with ANI > threshold) [2023-06-28 20:41:24,925] [INFO] The taxonomy check result is classified as 'below_threshold'. [2023-06-28 20:41:24,926] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Rhodovulum tesquicola strain=A-36s GCA_024128855.1 540254 540254 type True 78.2939 326 1164 95 below_threshold Rhodovulum steppense strain=DSM 21153 GCA_004339675.1 540251 540251 type True 78.2757 349 1164 95 below_threshold Limimaricola hongkongensis strain=DSM 17492 GCA_000600975.2 278132 278132 type True 78.2025 334 1164 95 below_threshold Limimaricola hongkongensis strain=UST950701-009P GCA_000365005.1 278132 278132 type True 78.1822 343 1164 95 below_threshold Limimaricola pyoseonensis strain=DSM 21424 GCA_900102015.1 521013 521013 type True 78.0922 370 1164 95 below_threshold Limimaricola variabilis strain=CECT 8572 GCA_014195545.1 1492771 1492771 type True 77.9352 319 1164 95 below_threshold Salibaculum halophilum strain=WDS1C4 GCA_002094885.1 1914408 1914408 type True 77.9201 320 1164 95 below_threshold Cereibacter ovatus strain=JA234 GCA_900207575.1 439529 439529 type True 77.906 330 1164 95 below_threshold Cereibacter sediminicola strain=JA983 GCA_007668225.1 2584941 2584941 type True 77.8657 331 1164 95 below_threshold Rhodovulum visakhapatnamense strain=JA181 GCA_004365965.1 364297 364297 type True 77.8606 352 1164 95 below_threshold Alexandriicola marinus strain=LZ-14 GCA_004000435.1 2081710 2081710 type True 77.7555 369 1164 95 below_threshold Roseibacterium elongatum strain=DFL-43 GCA_000590925.1 159346 159346 type True 77.7233 269 1164 95 below_threshold Loktanella fryxellensis strain=DSM 16213 GCA_900110065.1 245187 245187 type True 77.597 294 1164 95 below_threshold Rubellimicrobium aerolatum strain=DSM 19297 GCA_017872975.1 490979 490979 type True 77.5447 296 1164 95 below_threshold Maribius salinus strain=DSM 26892 GCA_900141995.1 313368 313368 type True 77.4018 302 1164 95 below_threshold Roseovarius indicus strain=B108 GCA_001441635.1 540747 540747 type True 77.3771 305 1164 95 below_threshold Salipiger profundus strain=CGMCC 1.12377 GCA_014637265.1 1229727 1229727 type True 77.3431 295 1164 95 below_threshold Flavimaricola marinus strain=CECT 8899 GCA_900184895.1 1819565 1819565 type True 77.3402 304 1164 95 below_threshold Tabrizicola algicola strain=ETT8 GCA_010915705.1 2709381 2709381 type True 77.3339 298 1164 95 below_threshold Roseovarius indicus strain=DSM 26383 GCA_008728195.1 540747 540747 type True 77.3129 306 1164 95 below_threshold Rubellimicrobium roseum strain=YIM 48858 GCA_006152145.1 687525 687525 type True 77.2832 344 1164 95 below_threshold Pseudooceanicola endophyticus strain=CBS1P-1 GCA_018760365.1 2841273 2841273 type True 77.2821 327 1164 95 below_threshold Roseivivax isoporae strain=LMG 25204 GCA_000521865.1 591206 591206 type True 77.2525 307 1164 95 below_threshold Palleronia rufa strain=MOLA 401 GCA_000743715.1 1530186 1530186 type True 77.1996 225 1164 95 below_threshold -------------------------------------------------------------------------------- [2023-06-28 20:41:24,928] [INFO] DFAST Taxonomy check result was written to GCA_020832515.1_ASM2083251v1_genomic.fna/tc_result.tsv [2023-06-28 20:41:24,929] [INFO] ===== Taxonomy check completed ===== [2023-06-28 20:41:24,929] [INFO] ===== Start completeness check using CheckM ===== [2023-06-28 20:41:24,930] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stgf97967a0-857d-49e4-b28c-664050765279/dqc_reference/checkm_data [2023-06-28 20:41:24,931] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2023-06-28 20:41:24,976] [INFO] Task started: CheckM [2023-06-28 20:41:24,976] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCA_020832515.1_ASM2083251v1_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCA_020832515.1_ASM2083251v1_genomic.fna/checkm_input GCA_020832515.1_ASM2083251v1_genomic.fna/checkm_result [2023-06-28 20:42:02,101] [INFO] Task succeeded: CheckM [2023-06-28 20:42:02,103] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 69.53% Contamintation: 0.52% Strain heterogeneity: 100.00% -------------------------------------------------------------------------------- [2023-06-28 20:42:02,131] [INFO] ===== Completeness check finished ===== [2023-06-28 20:42:02,132] [INFO] ===== Start GTDB Search ===== [2023-06-28 20:42:02,132] [INFO] Query marker FASTA already exists. Will reuse it. (GCA_020832515.1_ASM2083251v1_genomic.fna/markers.fasta) [2023-06-28 20:42:02,132] [INFO] Task started: Blastn [2023-06-28 20:42:02,133] [INFO] Running command: blastn -query GCA_020832515.1_ASM2083251v1_genomic.fna/markers.fasta -db /var/lib/cwl/stgf97967a0-857d-49e4-b28c-664050765279/dqc_reference/reference_markers_gtdb.fasta -out GCA_020832515.1_ASM2083251v1_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-28 20:42:03,404] [INFO] Task succeeded: Blastn [2023-06-28 20:42:03,410] [INFO] Selected 29 target genomes. [2023-06-28 20:42:03,410] [INFO] Target genome list was writen to GCA_020832515.1_ASM2083251v1_genomic.fna/target_genomes_gtdb.txt [2023-06-28 20:42:03,421] [INFO] Task started: fastANI [2023-06-28 20:42:03,422] [INFO] Running command: fastANI --query /var/lib/cwl/stg0e292d0c-8588-48af-836a-0919bbdf988a/GCA_020832515.1_ASM2083251v1_genomic.fna.gz --refList GCA_020832515.1_ASM2083251v1_genomic.fna/target_genomes_gtdb.txt --output GCA_020832515.1_ASM2083251v1_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2023-06-28 20:42:22,688] [INFO] Task succeeded: fastANI [2023-06-28 20:42:22,715] [INFO] Found 29 fastANI hits (0 hits with ANI > circumscription radius) [2023-06-28 20:42:22,715] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCA_009993005.1 s__JAACUH01 sp009993005 79.1279 461 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__JAACUH01 95.0 99.57 99.57 0.91 0.91 2 - GCA_001314655.1 s__Salibaculum sp001314655 78.4751 411 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Salibaculum 95.0 N/A N/A N/A N/A 1 - GCF_009649175.1 s__Rhodovulum strictum 78.2815 364 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Rhodovulum 95.0 N/A N/A N/A N/A 1 - GCA_015689745.1 s__Roseicyclus sp015689745 78.1962 321 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Roseicyclus 95.0 N/A N/A N/A N/A 1 - GCF_000600975.2 s__Limimaricola hongkongensis 78.1924 335 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Limimaricola 95.0 99.99 99.99 1.00 1.00 2 - GCF_900102015.1 s__Limimaricola pyoseonensis 78.0574 373 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Limimaricola 95.0 N/A N/A N/A N/A 1 - GCA_001650895.1 s__EhC02 sp001650895 78.0436 320 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__EhC02 95.0 N/A N/A N/A N/A 1 - GCF_012395815.1 s__Roseicyclus sp012395815 77.9323 365 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Roseicyclus 95.0 N/A N/A N/A N/A 1 - GCF_001482405.1 s__Ponticoccus marisrubri 77.8851 387 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Ponticoccus 95.0 N/A N/A N/A N/A 1 - GCF_004365965.1 s__Rhodovulum visakhapatnamense 77.8515 353 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Rhodovulum 95.0 98.55 98.44 0.90 0.89 5 - GCA_014859945.1 s__UBA1943 sp014859945 77.85 299 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__UBA1943 95.0 N/A N/A N/A N/A 1 - GCA_003511785.1 s__UBA7951 sp003511785 77.8246 254 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__UBA7951 95.0 N/A N/A N/A N/A 1 - GCA_007131945.1 s__Pararhodobacter sp007131945 77.766 277 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Pararhodobacter 95.0 99.16 99.15 0.86 0.82 3 - GCA_009920775.1 s__Marivivens sp009920775 77.7474 204 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Marivivens 95.0 N/A N/A N/A N/A 1 - GCF_001975705.1 s__Salipiger abyssi 77.7388 327 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Salipiger 95.0 N/A N/A N/A N/A 1 - GCF_004343075.1 s__Rhodovulum marinum 77.7205 340 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Rhodovulum 95.0 N/A N/A N/A N/A 1 - GCA_016124615.1 s__Solirhodobacter sp016124615 77.7182 363 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Solirhodobacter 95.0 N/A N/A N/A N/A 1 - GCF_000153305.1 s__Oceanicola granulosus 77.6436 342 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Oceanicola 95.0 N/A N/A N/A N/A 1 - GCF_900110065.1 s__Loktanella fryxellensis 77.6049 294 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Loktanella 95.0 N/A N/A N/A N/A 1 - GCF_000299575.1 s__Palleronia guishaninsula 77.5573 262 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Palleronia 95.0 N/A N/A N/A N/A 1 - GCF_004145845.1 s__Marivivens sp004145845 77.5473 297 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Marivivens 95.0 N/A N/A N/A N/A 1 - GCF_017916275.1 s__Rubellimicrobium sp017916275 77.5365 306 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Rubellimicrobium 95.0 100.00 100.00 1.00 1.00 2 - GCA_016278295.1 s__Roseovarius sp016278295 77.4041 238 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Roseovarius 95.0 N/A N/A N/A N/A 1 - GCF_003116585.1 s__Albibacillus kandeliae 77.3778 288 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Albibacillus 95.0 N/A N/A N/A N/A 1 - GCA_016792925.1 s__Gemmobacter sp016792925 77.3135 267 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Gemmobacter 95.0 N/A N/A N/A N/A 1 - GCF_008728195.1 s__Roseovarius indicus 77.3038 307 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Roseovarius 95.0 99.18 97.54 0.97 0.92 4 - GCA_015689685.1 s__HKCCE3408 sp015689685 77.2077 293 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__HKCCE3408 95.0 N/A N/A N/A N/A 1 - GCF_900116005.1 s__Poseidonocella sedimentorum 77.1116 232 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Poseidonocella 95.0 N/A N/A N/A N/A 1 - GCA_013042605.1 s__IMCC34051 sp013042605 77.0474 209 1164 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__IMCC34051 95.0 99.10 98.86 0.88 0.86 4 - -------------------------------------------------------------------------------- [2023-06-28 20:42:22,718] [INFO] GTDB search result was written to GCA_020832515.1_ASM2083251v1_genomic.fna/result_gtdb.tsv [2023-06-28 20:42:22,719] [INFO] ===== GTDB Search completed ===== [2023-06-28 20:42:22,725] [INFO] DFAST_QC result json was written to GCA_020832515.1_ASM2083251v1_genomic.fna/dqc_result.json [2023-06-28 20:42:22,725] [INFO] DFAST_QC completed! [2023-06-28 20:42:22,725] [INFO] Total running time: 0h1m31s