[2023-06-28 20:26:40,073] [INFO] DFAST_QC pipeline started. [2023-06-28 20:26:40,076] [INFO] DFAST_QC version: 0.5.7 [2023-06-28 20:26:40,077] [INFO] DQC Reference Directory: /var/lib/cwl/stgbb2e6991-f200-423e-afc3-160ce5daea4e/dqc_reference [2023-06-28 20:26:41,427] [INFO] ===== Start taxonomy check using ANI ===== [2023-06-28 20:26:41,428] [INFO] Task started: Prodigal [2023-06-28 20:26:41,429] [INFO] Running command: gunzip -c /var/lib/cwl/stg2f9f7208-b2d7-4d0b-b92e-b1a89162d0c0/GCA_015492455.1_ASM1549245v1_genomic.fna.gz | prodigal -d GCA_015492455.1_ASM1549245v1_genomic.fna/cds.fna -a GCA_015492455.1_ASM1549245v1_genomic.fna/protein.faa -g 11 -q > /dev/null [2023-06-28 20:26:47,092] [INFO] Task succeeded: Prodigal [2023-06-28 20:26:47,092] [INFO] Task started: HMMsearch [2023-06-28 20:26:47,093] [INFO] Running command: hmmsearch --tblout GCA_015492455.1_ASM1549245v1_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stgbb2e6991-f200-423e-afc3-160ce5daea4e/dqc_reference/reference_markers.hmm GCA_015492455.1_ASM1549245v1_genomic.fna/protein.faa > /dev/null [2023-06-28 20:26:47,331] [INFO] Task succeeded: HMMsearch [2023-06-28 20:26:47,332] [INFO] Found 6/6 markers. [2023-06-28 20:26:47,365] [INFO] Query marker FASTA was written to GCA_015492455.1_ASM1549245v1_genomic.fna/markers.fasta [2023-06-28 20:26:47,366] [INFO] Task started: Blastn [2023-06-28 20:26:47,366] [INFO] Running command: blastn -query GCA_015492455.1_ASM1549245v1_genomic.fna/markers.fasta -db /var/lib/cwl/stgbb2e6991-f200-423e-afc3-160ce5daea4e/dqc_reference/reference_markers.fasta -out GCA_015492455.1_ASM1549245v1_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-28 20:26:48,259] [INFO] Task succeeded: Blastn [2023-06-28 20:26:48,264] [INFO] Selected 31 target genomes. [2023-06-28 20:26:48,264] [INFO] Target genome list was writen to GCA_015492455.1_ASM1549245v1_genomic.fna/target_genomes.txt [2023-06-28 20:26:48,266] [INFO] Task started: fastANI [2023-06-28 20:26:48,266] [INFO] Running command: fastANI --query /var/lib/cwl/stg2f9f7208-b2d7-4d0b-b92e-b1a89162d0c0/GCA_015492455.1_ASM1549245v1_genomic.fna.gz --refList GCA_015492455.1_ASM1549245v1_genomic.fna/target_genomes.txt --output GCA_015492455.1_ASM1549245v1_genomic.fna/fastani_result.tsv --threads 1 [2023-06-28 20:27:07,876] [INFO] Task succeeded: fastANI [2023-06-28 20:27:07,877] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stgbb2e6991-f200-423e-afc3-160ce5daea4e/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2023-06-28 20:27:07,877] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stgbb2e6991-f200-423e-afc3-160ce5daea4e/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2023-06-28 20:27:07,895] [INFO] Found 22 fastANI hits (0 hits with ANI > threshold) [2023-06-28 20:27:07,895] [INFO] The taxonomy check result is classified as 'below_threshold'. [2023-06-28 20:27:07,895] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Thioalbus denitrificans strain=DSM 26407 GCA_003337735.1 547122 547122 type True 77.532 145 570 95 below_threshold Thiohalospira halophila strain=HL 3 GCA_900112605.1 381300 381300 type True 77.4871 106 570 95 below_threshold Thioalkalivibrio sulfidiphilus strain=HL-EbGR7 GCA_000021985.1 1033854 1033854 type True 77.4596 132 570 95 below_threshold Inmirania thermothiophila strain=DSM 100275 GCA_003751635.1 1750597 1750597 type True 77.3802 114 570 95 below_threshold Thioalkalivibrio thiocyanodenitrificans strain=ARhD 1 GCA_000378965.1 243063 243063 type True 77.0425 96 570 95 below_threshold Thioalkalivibrio denitrificans strain=ALJD GCA_002000365.1 108003 108003 type True 76.9414 104 570 95 below_threshold Sulfurivermis fontis strain=JG42 GCA_004001245.1 1972068 1972068 type True 76.9141 124 570 95 below_threshold Thiohalobacter thiocyanaticus strain=Hrh1 GCA_003932505.1 585455 585455 type True 76.8594 114 570 95 below_threshold Thiohalophilus thiocyanatoxydans strain=DSM 16326 GCA_004366735.1 381308 381308 type True 76.7007 82 570 95 below_threshold Ectothiorhodospira magna strain=B7-7 GCA_900110965.1 867345 867345 type True 76.5295 57 570 95 below_threshold Azotobacter beijerinckii strain=DSM 378 GCA_900110885.1 170623 170623 type True 76.4101 62 570 95 below_threshold Pseudoxanthomonas taiwanensis strain=DSM 22914 GCA_010093135.1 176598 176598 type True 76.3544 68 570 95 below_threshold Marichromatium gracile strain=DSM 203 GCA_016583515.1 1048 1048 type True 76.3374 76 570 95 below_threshold Marichromatium gracile strain=DSM 203 GCA_004343155.1 1048 1048 type True 76.327 77 570 95 below_threshold Pseudoxanthomonas broegbernensis strain=DSM 12573 GCA_010093165.1 83619 83619 type True 76.2839 61 570 95 below_threshold Marichromatium bheemlicum strain=DSM 18632 GCA_012276755.1 365339 365339 type True 76.2738 60 570 95 below_threshold Acidihalobacter aeolianus strain=V6 GCA_001753165.1 2792603 2792603 type True 76.2093 56 570 95 below_threshold Pseudoxanthomonas broegbernensis strain=DSM 12573 GCA_014202435.1 83619 83619 type True 76.2002 60 570 95 below_threshold Halomonas salipaludis strain=WRN001 GCA_002286975.1 2032625 2032625 type True 76.1595 53 570 95 below_threshold Thioalkalivibrio halophilus strain=HL17 GCA_001995255.1 252474 252474 type True 76.1285 63 570 95 below_threshold Ferrimonas sediminicola strain=IMCC35001 GCA_005116715.1 2569538 2569538 type True 76.1256 51 570 95 below_threshold Pseudomonas mangiferae strain=DMKU BBB3-04 GCA_007109405.1 2593654 2593654 type True 75.7985 77 570 95 below_threshold -------------------------------------------------------------------------------- [2023-06-28 20:27:07,898] [INFO] DFAST Taxonomy check result was written to GCA_015492455.1_ASM1549245v1_genomic.fna/tc_result.tsv [2023-06-28 20:27:07,899] [INFO] ===== Taxonomy check completed ===== [2023-06-28 20:27:07,899] [INFO] ===== Start completeness check using CheckM ===== [2023-06-28 20:27:07,899] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stgbb2e6991-f200-423e-afc3-160ce5daea4e/dqc_reference/checkm_data [2023-06-28 20:27:07,900] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2023-06-28 20:27:07,933] [INFO] Task started: CheckM [2023-06-28 20:27:07,933] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCA_015492455.1_ASM1549245v1_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCA_015492455.1_ASM1549245v1_genomic.fna/checkm_input GCA_015492455.1_ASM1549245v1_genomic.fna/checkm_result [2023-06-28 20:27:30,493] [INFO] Task succeeded: CheckM [2023-06-28 20:27:30,494] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 95.83% Contamintation: 0.38% Strain heterogeneity: 100.00% -------------------------------------------------------------------------------- [2023-06-28 20:27:30,529] [INFO] ===== Completeness check finished ===== [2023-06-28 20:27:30,529] [INFO] ===== Start GTDB Search ===== [2023-06-28 20:27:30,529] [INFO] Query marker FASTA already exists. Will reuse it. (GCA_015492455.1_ASM1549245v1_genomic.fna/markers.fasta) [2023-06-28 20:27:30,530] [INFO] Task started: Blastn [2023-06-28 20:27:30,530] [INFO] Running command: blastn -query GCA_015492455.1_ASM1549245v1_genomic.fna/markers.fasta -db /var/lib/cwl/stgbb2e6991-f200-423e-afc3-160ce5daea4e/dqc_reference/reference_markers_gtdb.fasta -out GCA_015492455.1_ASM1549245v1_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-28 20:27:31,852] [INFO] Task succeeded: Blastn [2023-06-28 20:27:31,856] [INFO] Selected 14 target genomes. [2023-06-28 20:27:31,856] [INFO] Target genome list was writen to GCA_015492455.1_ASM1549245v1_genomic.fna/target_genomes_gtdb.txt [2023-06-28 20:27:31,857] [INFO] Task started: fastANI [2023-06-28 20:27:31,857] [INFO] Running command: fastANI --query /var/lib/cwl/stg2f9f7208-b2d7-4d0b-b92e-b1a89162d0c0/GCA_015492455.1_ASM1549245v1_genomic.fna.gz --refList GCA_015492455.1_ASM1549245v1_genomic.fna/target_genomes_gtdb.txt --output GCA_015492455.1_ASM1549245v1_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2023-06-28 20:27:40,365] [INFO] Task succeeded: fastANI [2023-06-28 20:27:40,383] [INFO] Found 12 fastANI hits (1 hits with ANI > circumscription radius) [2023-06-28 20:27:40,384] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCA_015490535.1 s__DRMK01 sp015490535 99.2299 501 570 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__DRMK01;f__DRMK01;g__DRMK01 95.0 99.14 99.14 0.88 0.88 2 conclusive GCA_011322575.1 s__DRMK01 sp011322575 91.0323 289 570 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__DRMK01;f__DRMK01;g__DRMK01 95.0 N/A N/A N/A N/A 1 - GCA_015491735.1 s__DRMK01 sp015491735 80.7218 186 570 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__DRMK01;f__DRMK01;g__DRMK01 95.0 99.89 99.89 0.95 0.95 2 - GCA_015488495.1 s__DRMK01 sp015488495 79.7008 271 570 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__DRMK01;f__DRMK01;g__DRMK01 95.0 98.86 98.62 0.82 0.79 5 - GCA_011051715.1 s__HyVt-443 sp011051715 77.553 128 570 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Chromatiales;f__Sedimenticolaceae;g__HyVt-443 95.0 N/A N/A N/A N/A 1 - GCF_003337735.1 s__Thioalbus denitrificans 77.5109 146 570 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__DSM-26407;f__DSM-26407;g__Thioalbus 95.0 N/A N/A N/A N/A 1 - GCF_000021985.1 s__Thioalkalivibrio_A sulfidiphilus 77.4596 132 570 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Ectothiorhodospirales;f__Ectothiorhodospiraceae;g__Thioalkalivibrio_A 95.0 N/A N/A N/A N/A 1 - GCA_011371455.1 s__DRQN01 sp011371455 77.4587 89 570 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__SZUA-152;f__SZUA-152;g__DRQN01 95.0 97.38 97.35 0.90 0.88 5 - GCF_003751635.1 s__Inmirania thermothiophila 77.4041 113 570 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__DSM-100275;f__DSM-100275;g__Inmirania 95.0 N/A N/A N/A N/A 1 - GCF_002356355.1 s__Thiohalobacter thiocyanaticus_A 77.0903 131 570 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Thiohalobacterales;f__Thiohalobacteraceae;g__Thiohalobacter 95.0 98.53 98.53 0.93 0.93 2 - GCF_000377845.1 s__Thioalkalivibrio sp000377845 76.3465 53 570 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Ectothiorhodospirales;f__Thioalkalivibrionaceae;g__Thioalkalivibrio 95.0 N/A N/A N/A N/A 1 - GCF_001499735.1 s__Thiocapsa sp001499735 75.8209 55 570 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Chromatiales;f__Chromatiaceae;g__Thiocapsa 95.0 N/A N/A N/A N/A 1 - -------------------------------------------------------------------------------- [2023-06-28 20:27:40,386] [INFO] GTDB search result was written to GCA_015492455.1_ASM1549245v1_genomic.fna/result_gtdb.tsv [2023-06-28 20:27:40,387] [INFO] ===== GTDB Search completed ===== [2023-06-28 20:27:40,392] [INFO] DFAST_QC result json was written to GCA_015492455.1_ASM1549245v1_genomic.fna/dqc_result.json [2023-06-28 20:27:40,392] [INFO] DFAST_QC completed! [2023-06-28 20:27:40,393] [INFO] Total running time: 0h1m0s