[2024-01-25 19:55:20,898] [INFO] DFAST_QC pipeline started. [2024-01-25 19:55:20,899] [INFO] DFAST_QC version: 0.5.7 [2024-01-25 19:55:20,900] [INFO] DQC Reference Directory: /var/lib/cwl/stg1a526abf-09a2-4f30-9954-9372c261df98/dqc_reference [2024-01-25 19:55:22,062] [INFO] ===== Start taxonomy check using ANI ===== [2024-01-25 19:55:22,063] [INFO] Task started: Prodigal [2024-01-25 19:55:22,063] [INFO] Running command: gunzip -c /var/lib/cwl/stge7993a52-a817-4bba-9631-dab8cdb8e7e4/GCF_025345565.1_ASM2534556v1_genomic.fna.gz | prodigal -d GCF_025345565.1_ASM2534556v1_genomic.fna/cds.fna -a GCF_025345565.1_ASM2534556v1_genomic.fna/protein.faa -g 11 -q > /dev/null [2024-01-25 19:55:31,573] [INFO] Task succeeded: Prodigal [2024-01-25 19:55:31,573] [INFO] Task started: HMMsearch [2024-01-25 19:55:31,573] [INFO] Running command: hmmsearch --tblout GCF_025345565.1_ASM2534556v1_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg1a526abf-09a2-4f30-9954-9372c261df98/dqc_reference/reference_markers.hmm GCF_025345565.1_ASM2534556v1_genomic.fna/protein.faa > /dev/null [2024-01-25 19:55:31,790] [INFO] Task succeeded: HMMsearch [2024-01-25 19:55:31,791] [INFO] Found 6/6 markers. [2024-01-25 19:55:31,821] [INFO] Query marker FASTA was written to GCF_025345565.1_ASM2534556v1_genomic.fna/markers.fasta [2024-01-25 19:55:31,821] [INFO] Task started: Blastn [2024-01-25 19:55:31,821] [INFO] Running command: blastn -query GCF_025345565.1_ASM2534556v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg1a526abf-09a2-4f30-9954-9372c261df98/dqc_reference/reference_markers.fasta -out GCF_025345565.1_ASM2534556v1_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-25 19:55:32,583] [INFO] Task succeeded: Blastn [2024-01-25 19:55:32,586] [INFO] Selected 19 target genomes. [2024-01-25 19:55:32,586] [INFO] Target genome list was writen to GCF_025345565.1_ASM2534556v1_genomic.fna/target_genomes.txt [2024-01-25 19:55:32,597] [INFO] Task started: fastANI [2024-01-25 19:55:32,598] [INFO] Running command: fastANI --query /var/lib/cwl/stge7993a52-a817-4bba-9631-dab8cdb8e7e4/GCF_025345565.1_ASM2534556v1_genomic.fna.gz --refList GCF_025345565.1_ASM2534556v1_genomic.fna/target_genomes.txt --output GCF_025345565.1_ASM2534556v1_genomic.fna/fastani_result.tsv --threads 1 [2024-01-25 19:55:45,901] [INFO] Task succeeded: fastANI [2024-01-25 19:55:45,901] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg1a526abf-09a2-4f30-9954-9372c261df98/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2024-01-25 19:55:45,901] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg1a526abf-09a2-4f30-9954-9372c261df98/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2024-01-25 19:55:45,912] [INFO] Found 18 fastANI hits (0 hits with ANI > threshold) [2024-01-25 19:55:45,912] [INFO] The taxonomy check result is classified as 'below_threshold'. [2024-01-25 19:55:45,913] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Thalassolituus marinus strain=IMCC1826 GCA_020320395.1 671053 671053 type True 79.7323 572 1355 95 below_threshold Thalassolituus oleivorans strain=MIL-1 GCA_000355675.1 187493 187493 type True 78.4853 221 1355 95 below_threshold Oleibacter marinus strain=DSM 24913 GCA_900156675.1 484498 484498 type True 78.3708 228 1355 95 below_threshold Oceanobacter kriegii strain=DSM 6294 GCA_000422845.1 64972 64972 type True 78.3661 282 1355 95 below_threshold Oceanobacter mangrovi strain=SM2-42 GCA_019740315.1 2862510 2862510 type True 78.2593 287 1355 95 below_threshold Pseudocitrobacter corydidari GCA_021172065.1 2891570 2891570 type True 78.0025 59 1355 95 below_threshold Enterobacter hormaechei strain=FDAARGOS 1433 GCA_019048245.1 158836 158836 suspected-type True 77.8137 68 1355 95 below_threshold Bacterioplanes sanyensis strain=KCTC 32220 GCA_014652515.1 1249553 1249553 type True 77.4959 254 1355 95 below_threshold Stutzerimonas degradans strain=FDAARGOS_876 GCA_016028635.1 2968968 2968968 suspected-type True 76.6919 80 1355 95 below_threshold Marinobacterium arenosum strain=CAU 1594 GCA_019795155.1 2862496 2862496 type True 76.5804 94 1355 95 below_threshold Halopseudomonas oceani strain=CGMCC 1.15195 GCA_014641295.1 1708783 1708783 type True 76.5248 90 1355 95 below_threshold Enterobacter wuhouensis strain=WCHEW120002 GCA_004331265.1 2529381 2529381 type True 76.5083 52 1355 95 below_threshold Halopseudomonas oceani strain=DSM 100277 GCA_002903165.1 1708783 1708783 type True 76.4276 91 1355 95 below_threshold Marinobacterium halophilum strain=DSM 17586 GCA_003014615.1 267374 267374 type True 76.4192 91 1355 95 below_threshold Marinobacterium litorale strain=DSM 23545 GCA_000428985.1 404770 404770 type True 76.4169 71 1355 95 below_threshold Marinobacter shengliensis subsp. alexandrii strain=LZ-6 GCA_005871095.1 2570350 1389223 type True 76.1519 84 1355 95 below_threshold Marinobacter daepoensis strain=DSM 16072 GCA_000421165.1 262077 262077 type True 75.9987 65 1355 95 below_threshold Marinobacterium nitratireducens strain=CGMCC 1.7286 GCA_014645375.1 518897 518897 type True 75.9526 69 1355 95 below_threshold -------------------------------------------------------------------------------- [2024-01-25 19:55:45,914] [INFO] DFAST Taxonomy check result was written to GCF_025345565.1_ASM2534556v1_genomic.fna/tc_result.tsv [2024-01-25 19:55:45,914] [INFO] ===== Taxonomy check completed ===== [2024-01-25 19:55:45,914] [INFO] ===== Start completeness check using CheckM ===== [2024-01-25 19:55:45,915] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg1a526abf-09a2-4f30-9954-9372c261df98/dqc_reference/checkm_data [2024-01-25 19:55:45,915] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2024-01-25 19:55:45,957] [INFO] Task started: CheckM [2024-01-25 19:55:45,957] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCF_025345565.1_ASM2534556v1_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCF_025345565.1_ASM2534556v1_genomic.fna/checkm_input GCF_025345565.1_ASM2534556v1_genomic.fna/checkm_result [2024-01-25 19:56:17,874] [INFO] Task succeeded: CheckM [2024-01-25 19:56:17,877] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 100.00% Contamintation: 0.00% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2024-01-25 19:56:17,895] [INFO] ===== Completeness check finished ===== [2024-01-25 19:56:17,896] [INFO] ===== Start GTDB Search ===== [2024-01-25 19:56:17,897] [INFO] Query marker FASTA already exists. Will reuse it. (GCF_025345565.1_ASM2534556v1_genomic.fna/markers.fasta) [2024-01-25 19:56:17,897] [INFO] Task started: Blastn [2024-01-25 19:56:17,897] [INFO] Running command: blastn -query GCF_025345565.1_ASM2534556v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg1a526abf-09a2-4f30-9954-9372c261df98/dqc_reference/reference_markers_gtdb.fasta -out GCF_025345565.1_ASM2534556v1_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-25 19:56:19,211] [INFO] Task succeeded: Blastn [2024-01-25 19:56:19,214] [INFO] Selected 8 target genomes. [2024-01-25 19:56:19,214] [INFO] Target genome list was writen to GCF_025345565.1_ASM2534556v1_genomic.fna/target_genomes_gtdb.txt [2024-01-25 19:56:19,225] [INFO] Task started: fastANI [2024-01-25 19:56:19,225] [INFO] Running command: fastANI --query /var/lib/cwl/stge7993a52-a817-4bba-9631-dab8cdb8e7e4/GCF_025345565.1_ASM2534556v1_genomic.fna.gz --refList GCF_025345565.1_ASM2534556v1_genomic.fna/target_genomes_gtdb.txt --output GCF_025345565.1_ASM2534556v1_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2024-01-25 19:56:26,299] [INFO] Task succeeded: fastANI [2024-01-25 19:56:26,305] [INFO] Found 8 fastANI hits (0 hits with ANI > circumscription radius) [2024-01-25 19:56:26,305] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCF_007785795.1 s__UBA2009 sp002335285 92.9082 1203 1355 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__DSM-6294;g__UBA2009 95.0 97.54 97.44 0.92 0.92 6 - GCA_002733205.1 s__UBA2009 sp002733205 88.8545 1018 1355 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__DSM-6294;g__UBA2009 95.0 96.83 96.83 0.84 0.84 2 - GCA_016132445.1 s__UBA2009 sp016132445 80.0726 557 1355 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__DSM-6294;g__UBA2009 95.0 97.98 97.98 0.94 0.94 2 - GCA_002706025.1 s__UBA2009 sp002706025 79.438 488 1355 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__DSM-6294;g__UBA2009 95.0 98.78 97.86 0.91 0.85 7 - GCA_002314145.1 s__UBA2009 sp002314145 78.9992 490 1355 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__DSM-6294;g__UBA2009 95.0 99.98 99.96 0.95 0.88 5 - GCF_900156675.1 s__Oleibacter marinus 78.3694 227 1355 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__DSM-6294;g__Oleibacter 95.0 98.96 98.96 0.99 0.99 2 - GCA_002724925.1 s__Oleibacter sp002724925 78.3574 231 1355 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__DSM-6294;g__Oleibacter 95.0 98.64 97.85 0.94 0.89 5 - GCF_000422845.1 s__Oceanobacter kriegii 78.3395 284 1355 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__DSM-6294;g__Oceanobacter 95.0 N/A N/A N/A N/A 1 - -------------------------------------------------------------------------------- [2024-01-25 19:56:26,307] [INFO] GTDB search result was written to GCF_025345565.1_ASM2534556v1_genomic.fna/result_gtdb.tsv [2024-01-25 19:56:26,307] [INFO] ===== GTDB Search completed ===== [2024-01-25 19:56:26,310] [INFO] DFAST_QC result json was written to GCF_025345565.1_ASM2534556v1_genomic.fna/dqc_result.json [2024-01-25 19:56:26,310] [INFO] DFAST_QC completed! [2024-01-25 19:56:26,311] [INFO] Total running time: 0h1m5s