[2023-03-18 22:16:20,349] [INFO] DFAST_QC pipeline started. [2023-03-18 22:16:20,350] [INFO] DFAST_QC version: 0.5.7 [2023-03-18 22:16:20,350] [INFO] DQC Reference Directory: /var/lib/cwl/stg8e20e66b-211a-4c14-b67a-03dedb033643/dqc_reference [2023-03-18 22:16:21,499] [INFO] ===== Start taxonomy check using ANI ===== [2023-03-18 22:16:21,499] [INFO] Task started: Prodigal [2023-03-18 22:16:21,499] [INFO] Running command: cat /var/lib/cwl/stg1ac5e5c3-b8ec-45af-b7d3-790d84ad41c4/OceanDNA-b6207.fa | prodigal -d OceanDNA-b6207/cds.fna -a OceanDNA-b6207/protein.faa -g 11 -q > /dev/null [2023-03-18 22:16:47,116] [INFO] Task succeeded: Prodigal [2023-03-18 22:16:47,117] [INFO] Task started: HMMsearch [2023-03-18 22:16:47,117] [INFO] Running command: hmmsearch --tblout OceanDNA-b6207/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg8e20e66b-211a-4c14-b67a-03dedb033643/dqc_reference/reference_markers.hmm OceanDNA-b6207/protein.faa > /dev/null [2023-03-18 22:16:47,348] [INFO] Task succeeded: HMMsearch [2023-03-18 22:16:47,348] [INFO] Found 6/6 markers. [2023-03-18 22:16:47,376] [INFO] Query marker FASTA was written to OceanDNA-b6207/markers.fasta [2023-03-18 22:16:47,376] [INFO] Task started: Blastn [2023-03-18 22:16:47,376] [INFO] Running command: blastn -query OceanDNA-b6207/markers.fasta -db /var/lib/cwl/stg8e20e66b-211a-4c14-b67a-03dedb033643/dqc_reference/reference_markers.fasta -out OceanDNA-b6207/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-03-18 22:16:47,951] [INFO] Task succeeded: Blastn [2023-03-18 22:16:47,952] [INFO] Selected 30 target genomes. [2023-03-18 22:16:47,952] [INFO] Target genome list was writen to OceanDNA-b6207/target_genomes.txt [2023-03-18 22:16:47,964] [INFO] Task started: fastANI [2023-03-18 22:16:47,964] [INFO] Running command: fastANI --query /var/lib/cwl/stg1ac5e5c3-b8ec-45af-b7d3-790d84ad41c4/OceanDNA-b6207.fa --refList OceanDNA-b6207/target_genomes.txt --output OceanDNA-b6207/fastani_result.tsv --threads 1 [2023-03-18 22:17:04,943] [INFO] Task succeeded: fastANI [2023-03-18 22:17:04,943] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg8e20e66b-211a-4c14-b67a-03dedb033643/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2023-03-18 22:17:04,944] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg8e20e66b-211a-4c14-b67a-03dedb033643/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2023-03-18 22:17:04,956] [INFO] Found 21 fastANI hits (0 hits with ANI > threshold) [2023-03-18 22:17:04,956] [INFO] The taxonomy check result is classified as 'below_threshold'. [2023-03-18 22:17:04,956] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Abyssalbus ytuae strain=MT3330 GCA_022807975.1 2926907 2926907 type True 76.3782 88 1262 95 below_threshold Joostella marina strain=DSM 19592 GCA_000260115.1 453852 453852 type True 75.9016 81 1262 95 below_threshold Mesoflavibacter zeaxanthinifaciens strain=DSM 18436 GCA_000422365.1 393060 393060 type True 75.8071 60 1262 95 below_threshold Mesoflavibacter profundi strain=YC1039 GCA_014764305.1 2708110 2708110 type True 75.7905 60 1262 95 below_threshold Maribacter caenipelagi strain=CECT 8455 GCA_004364175.1 1447781 1447781 type True 75.7715 54 1262 95 below_threshold Algibacter marinivivus strain=ZY111 GCA_003143755.1 2100723 2100723 type True 75.7674 57 1262 95 below_threshold Cellulophaga algicola strain=DSM 14237 GCA_000186265.1 59600 59600 type True 75.7485 64 1262 95 below_threshold Algibacter pacificus strain=H164 GCA_008033385.1 2599389 2599389 type True 75.7204 66 1262 95 below_threshold Formosa sediminum strain=PS13 GCA_007197735.1 2594004 2594004 type True 75.6937 62 1262 95 below_threshold Joostella atrarenae strain=M1-2 GCA_021764745.1 679257 679257 type True 75.6876 61 1262 95 below_threshold Mesoflavibacter zeaxanthinifaciens subsp. sabulilitoris strain=KCTC 42117 GCA_003008435.1 1520893 393060 type True 75.6836 59 1262 95 below_threshold Pustulibacterium marinum strain=CGMCC 1.12333 GCA_900116665.1 1224947 1224947 type True 75.6823 66 1262 95 below_threshold Kordia algicida strain=OT-1 GCA_000154725.1 221066 221066 type True 75.6772 69 1262 95 below_threshold Maribacter spongiicola strain=DSM 25233 GCA_004364165.1 1206753 1206753 type True 75.6753 58 1262 95 below_threshold Maribacter litoralis strain=SDRB-Phe2 GCA_003075045.1 2059726 2059726 type True 75.6747 69 1262 95 below_threshold Kordia jejudonensis strain=SSK3-3 GCA_001005315.1 1348245 1348245 type True 75.6535 60 1262 95 below_threshold Mesoflavibacter zeaxanthinifaciens subsp. sabulilitoris strain=CECT 8597 GCA_014191595.1 1520893 393060 type True 75.5581 58 1262 95 below_threshold Kordia antarctica strain=IMCC3317 GCA_009901525.1 1218801 1218801 type True 75.5303 63 1262 95 below_threshold Yeosuana aromativorans strain=JCM 12862 GCA_014646655.1 288019 288019 type True 75.4488 53 1262 95 below_threshold Algibacter alginicilyticus strain=HZ22 GCA_001310225.1 1736674 1736674 type True 75.4324 52 1262 95 below_threshold Mesonia oceanica strain=ISS653 GCA_902499555.1 2687242 2687242 type True 75.2508 62 1262 95 below_threshold -------------------------------------------------------------------------------- [2023-03-18 22:17:04,956] [INFO] DFAST Taxonomy check result was written to OceanDNA-b6207/tc_result.tsv [2023-03-18 22:17:04,957] [INFO] ===== Taxonomy check completed ===== [2023-03-18 22:17:04,957] [INFO] ===== Start completeness check using CheckM ===== [2023-03-18 22:17:04,957] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg8e20e66b-211a-4c14-b67a-03dedb033643/dqc_reference/checkm_data [2023-03-18 22:17:04,958] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2023-03-18 22:17:04,964] [INFO] Task started: CheckM [2023-03-18 22:17:04,964] [INFO] Running command: checkm taxonomy_wf --tab_table -f OceanDNA-b6207/cc_result.tsv -t 1 life "Prokaryote" OceanDNA-b6207/checkm_input OceanDNA-b6207/checkm_result [2023-03-18 22:18:08,330] [INFO] Task succeeded: CheckM [2023-03-18 22:18:08,330] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 87.50% Contamintation: 0.00% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2023-03-18 22:18:08,333] [INFO] ===== Completeness check finished ===== [2023-03-18 22:18:08,334] [INFO] ===== Start GTDB Search ===== [2023-03-18 22:18:08,334] [INFO] Query marker FASTA already exists. Will reuse it. (OceanDNA-b6207/markers.fasta) [2023-03-18 22:18:08,335] [INFO] Task started: Blastn [2023-03-18 22:18:08,335] [INFO] Running command: blastn -query OceanDNA-b6207/markers.fasta -db /var/lib/cwl/stg8e20e66b-211a-4c14-b67a-03dedb033643/dqc_reference/reference_markers_gtdb.fasta -out OceanDNA-b6207/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-03-18 22:18:09,182] [INFO] Task succeeded: Blastn [2023-03-18 22:18:09,183] [INFO] Selected 30 target genomes. [2023-03-18 22:18:09,183] [INFO] Target genome list was writen to OceanDNA-b6207/target_genomes_gtdb.txt [2023-03-18 22:18:09,211] [INFO] Task started: fastANI [2023-03-18 22:18:09,212] [INFO] Running command: fastANI --query /var/lib/cwl/stg1ac5e5c3-b8ec-45af-b7d3-790d84ad41c4/OceanDNA-b6207.fa --refList OceanDNA-b6207/target_genomes_gtdb.txt --output OceanDNA-b6207/fastani_result_gtdb.tsv --threads 1 [2023-03-18 22:18:26,832] [INFO] Task succeeded: fastANI [2023-03-18 22:18:26,846] [INFO] Found 20 fastANI hits (0 hits with ANI > circumscription radius) [2023-03-18 22:18:26,846] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCA_009901575.1 s__Leptobacterium sp009901575 76.1161 108 1262 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Flavobacteriales;f__Flavobacteriaceae;g__Leptobacterium 95.0 N/A N/A N/A N/A 1 - GCF_000260115.1 s__Joostella marina 75.9016 81 1262 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Flavobacteriales;f__Flavobacteriaceae;g__Joostella 95.0 N/A N/A N/A N/A 1 - GCF_000422365.1 s__Mesoflavibacter zeaxanthinifaciens 75.8071 60 1262 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Flavobacteriales;f__Flavobacteriaceae;g__Mesoflavibacter 95.0 97.07 97.05 0.91 0.91 3 - GCF_014764305.1 s__Mesoflavibacter profundi 75.7905 60 1262 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Flavobacteriales;f__Flavobacteriaceae;g__Mesoflavibacter 95.0 99.53 99.53 0.98 0.98 3 - GCF_004364175.1 s__Maribacter caenipelagi 75.7715 54 1262 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Flavobacteriales;f__Flavobacteriaceae;g__Maribacter 95.0 N/A N/A N/A N/A 1 - GCF_003143755.1 s__Algibacter_B sp003143755 75.7674 57 1262 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Flavobacteriales;f__Flavobacteriaceae;g__Algibacter_B 95.0 N/A N/A N/A N/A 1 - GCF_000186265.1 s__Cellulophaga algicola 75.7485 64 1262 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Flavobacteriales;f__Flavobacteriaceae;g__Cellulophaga 95.0 98.35 98.34 0.88 0.87 3 - GCF_900176415.1 s__Cellulophaga tyrosinoxydans 75.7023 53 1262 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Flavobacteriales;f__Flavobacteriaceae;g__Cellulophaga 95.0 N/A N/A N/A N/A 1 - GCF_007197735.1 s__Formosa sediminum 75.6937 62 1262 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Flavobacteriales;f__Flavobacteriaceae;g__Formosa 95.0 N/A N/A N/A N/A 1 - GCF_014641635.1 s__Aquaticitalea lipolytica 75.6909 56 1262 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Flavobacteriales;f__Flavobacteriaceae;g__Aquaticitalea 95.0 N/A N/A N/A N/A 1 - GCF_000190595.1 s__Cellulophaga lytica 75.6845 57 1262 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Flavobacteriales;f__Flavobacteriaceae;g__Cellulophaga 95.0 99.01 98.71 0.94 0.93 6 - GCF_900116665.1 s__Pustulibacterium marinum 75.6823 66 1262 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Flavobacteriales;f__Flavobacteriaceae;g__Pustulibacterium 95.0 N/A N/A N/A N/A 1 - GCF_000154725.1 s__Kordia algicida 75.6772 69 1262 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Flavobacteriales;f__Flavobacteriaceae;g__Kordia 95.0 N/A N/A N/A N/A 1 - GCF_004364165.1 s__Maribacter spongiicola 75.6753 58 1262 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Flavobacteriales;f__Flavobacteriaceae;g__Maribacter 95.0 N/A N/A N/A N/A 1 - GCA_002746415.1 s__Saonia sp002746415 75.6706 54 1262 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Flavobacteriales;f__Flavobacteriaceae;g__Saonia 95.0 N/A N/A N/A N/A 1 - GCF_003075045.1 s__Maribacter litoralis 75.6578 70 1262 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Flavobacteriales;f__Flavobacteriaceae;g__Maribacter 95.0 97.43 97.43 0.90 0.90 2 - GCF_004366715.1 s__Meridianimaribacter flavus 75.6291 59 1262 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Flavobacteriales;f__Flavobacteriaceae;g__Meridianimaribacter 95.0 97.44 97.06 0.91 0.87 3 - GCF_016734785.1 s__Galbibacter_A mesophilus 75.5232 72 1262 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Flavobacteriales;f__Flavobacteriaceae;g__Galbibacter_A 95.0 N/A N/A N/A N/A 1 - GCF_001999725.1 s__Cellulophaga omnivescoria 75.4932 65 1262 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Flavobacteriales;f__Flavobacteriaceae;g__Cellulophaga 95.0 N/A N/A N/A N/A 1 - GCF_014646655.1 s__Yeosuana aromativorans 75.4488 53 1262 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Flavobacteriales;f__Flavobacteriaceae;g__Yeosuana 95.0 N/A N/A N/A N/A 1 - -------------------------------------------------------------------------------- [2023-03-18 22:18:26,847] [INFO] GTDB search result was written to OceanDNA-b6207/result_gtdb.tsv [2023-03-18 22:18:26,847] [INFO] ===== GTDB Search completed ===== [2023-03-18 22:18:26,850] [INFO] DFAST_QC result json was written to OceanDNA-b6207/dqc_result.json [2023-03-18 22:18:26,850] [INFO] DFAST_QC completed! [2023-03-18 22:18:26,851] [INFO] Total running time: 0h2m7s