[2023-03-18 22:16:20,362] [INFO] DFAST_QC pipeline started. [2023-03-18 22:16:20,362] [INFO] DFAST_QC version: 0.5.7 [2023-03-18 22:16:20,362] [INFO] DQC Reference Directory: /var/lib/cwl/stgdb91a80a-1880-4ac5-a879-7462ceb0b78c/dqc_reference [2023-03-18 22:16:21,460] [INFO] ===== Start taxonomy check using ANI ===== [2023-03-18 22:16:21,460] [INFO] Task started: Prodigal [2023-03-18 22:16:21,461] [INFO] Running command: cat /var/lib/cwl/stg6357a282-5994-4ec6-a160-74236171764a/OceanDNA-b34242.fa | prodigal -d OceanDNA-b34242/cds.fna -a OceanDNA-b34242/protein.faa -g 11 -q > /dev/null [2023-03-18 22:16:30,347] [INFO] Task succeeded: Prodigal [2023-03-18 22:16:30,347] [INFO] Task started: HMMsearch [2023-03-18 22:16:30,347] [INFO] Running command: hmmsearch --tblout OceanDNA-b34242/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stgdb91a80a-1880-4ac5-a879-7462ceb0b78c/dqc_reference/reference_markers.hmm OceanDNA-b34242/protein.faa > /dev/null [2023-03-18 22:16:30,510] [INFO] Task succeeded: HMMsearch [2023-03-18 22:16:30,511] [INFO] Found 6/6 markers. [2023-03-18 22:16:30,528] [INFO] Query marker FASTA was written to OceanDNA-b34242/markers.fasta [2023-03-18 22:16:30,529] [INFO] Task started: Blastn [2023-03-18 22:16:30,529] [INFO] Running command: blastn -query OceanDNA-b34242/markers.fasta -db /var/lib/cwl/stgdb91a80a-1880-4ac5-a879-7462ceb0b78c/dqc_reference/reference_markers.fasta -out OceanDNA-b34242/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-03-18 22:16:31,301] [INFO] Task succeeded: Blastn [2023-03-18 22:16:31,303] [INFO] Selected 28 target genomes. [2023-03-18 22:16:31,303] [INFO] Target genome list was writen to OceanDNA-b34242/target_genomes.txt [2023-03-18 22:16:31,322] [INFO] Task started: fastANI [2023-03-18 22:16:31,322] [INFO] Running command: fastANI --query /var/lib/cwl/stg6357a282-5994-4ec6-a160-74236171764a/OceanDNA-b34242.fa --refList OceanDNA-b34242/target_genomes.txt --output OceanDNA-b34242/fastani_result.tsv --threads 1 [2023-03-18 22:16:46,995] [INFO] Task succeeded: fastANI [2023-03-18 22:16:46,995] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stgdb91a80a-1880-4ac5-a879-7462ceb0b78c/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2023-03-18 22:16:46,996] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stgdb91a80a-1880-4ac5-a879-7462ceb0b78c/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2023-03-18 22:16:47,007] [INFO] Found 19 fastANI hits (0 hits with ANI > threshold) [2023-03-18 22:16:47,007] [INFO] The taxonomy check result is classified as 'below_threshold'. [2023-03-18 22:16:47,007] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Thiohalocapsa marina strain=DSM 19078 GCA_008632335.1 424902 424902 type True 77.5215 75 320 95 below_threshold Marichromatium gracile strain=DSM 203 GCA_004343155.1 1048 1048 type True 77.471 84 320 95 below_threshold Marichromatium gracile strain=DSM 203 GCA_016583515.1 1048 1048 type True 77.4333 84 320 95 below_threshold Marichromatium purpuratum strain=984 GCA_000224005.3 37487 37487 type True 77.3107 81 320 95 below_threshold Marichromatium bheemlicum strain=DSM 18632 GCA_012276755.1 365339 365339 type True 76.9371 69 320 95 below_threshold Thiohalobacter thiocyanaticus strain=Hrh1 GCA_003932505.1 585455 585455 type True 76.8564 69 320 95 below_threshold Plasticicumulans lactativorans strain=DSM 25287 GCA_004341245.1 1133106 1133106 type True 76.8418 65 320 95 below_threshold Thioalkalivibrio sulfidiphilus strain=HL-EbGR7 GCA_000021985.1 1033854 1033854 type True 76.8227 80 320 95 below_threshold Pseudomonas lalucatii strain=R1b54 GCA_018398425.1 1424203 1424203 type True 76.644 57 320 95 below_threshold Allochromatium vinosum strain=DSM 180 GCA_000025485.1 1049 1049 type True 76.5639 63 320 95 below_threshold Pseudomonas cavernae strain=K2W31S-8 GCA_003595175.1 2320867 2320867 type True 76.535 54 320 95 below_threshold Allochromatium humboldtianum strain=DSM 21881 GCA_013385175.1 504901 504901 type True 76.5018 59 320 95 below_threshold Thioflavicoccus mobilis strain=8321 GCA_000327045.1 80679 80679 type True 76.4076 54 320 95 below_threshold Halomonas smyrnensis strain=AAD6 GCA_000265245.2 720605 720605 type True 76.2788 56 320 95 below_threshold Halomonas halophila strain=NBRC 102604 GCA_007989465.1 29573 29573 type True 76.2399 54 320 95 below_threshold Pseudomonas aeruginosa strain=JCM 5962 GCA_022496575.1 287 287 type True 76.1344 63 320 95 below_threshold Pseudomonas tumuqii strain=LAMW06 GCA_013184545.1 2715755 2715755 type True 76.0193 58 320 95 below_threshold Pseudomonas campi strain=S1-A32-2 GCA_013200955.2 2731681 2731681 type True 75.9819 50 320 95 below_threshold Pseudomonas muyukensis strain=COW39 GCA_019139535.1 2842357 2842357 type True 75.7921 56 320 95 below_threshold -------------------------------------------------------------------------------- [2023-03-18 22:16:47,008] [INFO] DFAST Taxonomy check result was written to OceanDNA-b34242/tc_result.tsv [2023-03-18 22:16:47,008] [INFO] ===== Taxonomy check completed ===== [2023-03-18 22:16:47,008] [INFO] ===== Start completeness check using CheckM ===== [2023-03-18 22:16:47,008] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stgdb91a80a-1880-4ac5-a879-7462ceb0b78c/dqc_reference/checkm_data [2023-03-18 22:16:47,009] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2023-03-18 22:16:47,013] [INFO] Task started: CheckM [2023-03-18 22:16:47,014] [INFO] Running command: checkm taxonomy_wf --tab_table -f OceanDNA-b34242/cc_result.tsv -t 1 life "Prokaryote" OceanDNA-b34242/checkm_input OceanDNA-b34242/checkm_result [2023-03-18 22:17:12,525] [INFO] Task succeeded: CheckM [2023-03-18 22:17:12,525] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 50.00% Contamintation: 4.17% Strain heterogeneity: 100.00% -------------------------------------------------------------------------------- [2023-03-18 22:17:12,527] [INFO] ===== Completeness check finished ===== [2023-03-18 22:17:12,527] [INFO] ===== Start GTDB Search ===== [2023-03-18 22:17:12,528] [INFO] Query marker FASTA already exists. Will reuse it. (OceanDNA-b34242/markers.fasta) [2023-03-18 22:17:12,529] [INFO] Task started: Blastn [2023-03-18 22:17:12,529] [INFO] Running command: blastn -query OceanDNA-b34242/markers.fasta -db /var/lib/cwl/stgdb91a80a-1880-4ac5-a879-7462ceb0b78c/dqc_reference/reference_markers_gtdb.fasta -out OceanDNA-b34242/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-03-18 22:17:13,893] [INFO] Task succeeded: Blastn [2023-03-18 22:17:13,894] [INFO] Selected 23 target genomes. [2023-03-18 22:17:13,894] [INFO] Target genome list was writen to OceanDNA-b34242/target_genomes_gtdb.txt [2023-03-18 22:17:13,914] [INFO] Task started: fastANI [2023-03-18 22:17:13,914] [INFO] Running command: fastANI --query /var/lib/cwl/stg6357a282-5994-4ec6-a160-74236171764a/OceanDNA-b34242.fa --refList OceanDNA-b34242/target_genomes_gtdb.txt --output OceanDNA-b34242/fastani_result_gtdb.tsv --threads 1 [2023-03-18 22:17:25,762] [INFO] Task succeeded: fastANI [2023-03-18 22:17:25,772] [INFO] Found 16 fastANI hits (0 hits with ANI > circumscription radius) [2023-03-18 22:17:25,772] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCF_000801295.1 s__MONJU sp000801295 78.8667 147 320 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Chromatiales;f__Sedimenticolaceae;g__MONJU 95.0 N/A N/A N/A N/A 1 - GCA_002009425.1 s__41T-STBD-0c-01a sp002009425 78.2165 122 320 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Chromatiales;f__Sedimenticolaceae;g__41T-STBD-0c-01a 95.0 N/A N/A N/A N/A 1 - GCF_008632335.1 s__Thiohalocapsa marina 77.5215 75 320 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Chromatiales;f__Chromatiaceae;g__Thiohalocapsa 95.0 N/A N/A N/A N/A 1 - GCA_003696905.1 s__MONJU sp003696905 77.3365 89 320 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Chromatiales;f__Sedimenticolaceae;g__MONJU 95.0 N/A N/A N/A N/A 1 - GCF_002356355.1 s__Thiohalobacter thiocyanaticus_A 77.2288 76 320 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Thiohalobacterales;f__Thiohalobacteraceae;g__Thiohalobacter 95.0 98.53 98.53 0.93 0.93 2 - GCA_003562815.1 s__Halochromatium sp003562815 77.0376 50 320 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Chromatiales;f__Chromatiaceae;g__Halochromatium 95.0 99.58 99.39 0.91 0.88 4 - GCA_015231665.1 s__JADGBD01 sp015231665 77.0223 56 320 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Chromatiales;f__Sedimenticolaceae;g__JADGBD01 95.0 N/A N/A N/A N/A 1 - GCA_011051715.1 s__HyVt-443 sp011051715 76.9784 79 320 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Chromatiales;f__Sedimenticolaceae;g__HyVt-443 95.0 N/A N/A N/A N/A 1 - GCF_012276755.1 s__Marichromatium bheemlicum 76.9371 69 320 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Chromatiales;f__Chromatiaceae;g__Marichromatium 95.0 N/A N/A N/A N/A 1 - GCF_004341245.1 s__Plasticicumulans_A lactativorans 76.8418 65 320 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Competibacterales;f__Competibacteraceae;g__Plasticicumulans_A 95.0 N/A N/A N/A N/A 1 - GCA_009497875.2 s__JADGBD01 sp009497875 76.7909 73 320 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Chromatiales;f__Sedimenticolaceae;g__JADGBD01 95.0 N/A N/A N/A N/A 1 - GCA_003972985.1 s__Thiolapillus sp003972985 76.7229 66 320 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Chromatiales;f__Sedimenticolaceae;g__Thiolapillus 95.0 99.52 99.52 0.88 0.88 2 - GCF_015070855.1 s__Pseudomonas_A lopnurensis 76.5559 56 320 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Pseudomonas_A 95.0 98.76 98.41 0.83 0.83 4 - GCF_000377785.1 s__Thioalkalivibrio sp000377785 76.5469 50 320 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Ectothiorhodospirales;f__Thioalkalivibrionaceae;g__Thioalkalivibrio 95.0 N/A N/A N/A N/A 1 - GCF_000265245.1 s__Halomonas smyrnensis 76.2231 58 320 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Halomonadaceae;g__Halomonas 95.0 98.45 98.45 0.89 0.89 2 - GCF_007989465.1 s__Halomonas halophila 76.1836 56 320 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Halomonadaceae;g__Halomonas 95.0 98.43 97.59 0.92 0.89 4 - -------------------------------------------------------------------------------- [2023-03-18 22:17:25,773] [INFO] GTDB search result was written to OceanDNA-b34242/result_gtdb.tsv [2023-03-18 22:17:25,773] [INFO] ===== GTDB Search completed ===== [2023-03-18 22:17:25,775] [INFO] DFAST_QC result json was written to OceanDNA-b34242/dqc_result.json [2023-03-18 22:17:25,775] [INFO] DFAST_QC completed! [2023-03-18 22:17:25,775] [INFO] Total running time: 0h1m5s