[2023-03-16 16:26:08,050] [INFO] DFAST_QC pipeline started. [2023-03-16 16:26:08,050] [INFO] DFAST_QC version: 0.5.7 [2023-03-16 16:26:08,050] [INFO] DQC Reference Directory: /var/lib/cwl/stg57252072-da36-4a03-b020-648a46794e17/dqc_reference [2023-03-16 16:26:09,223] [INFO] ===== Start taxonomy check using ANI ===== [2023-03-16 16:26:09,224] [INFO] Task started: Prodigal [2023-03-16 16:26:09,224] [INFO] Running command: cat /var/lib/cwl/stga8f81936-7d8d-40fc-9818-479c6b084bf5/OceanDNA-b27593.fa | prodigal -d OceanDNA-b27593/cds.fna -a OceanDNA-b27593/protein.faa -g 11 -q > /dev/null [2023-03-16 16:26:30,967] [INFO] Task succeeded: Prodigal [2023-03-16 16:26:30,968] [INFO] Task started: HMMsearch [2023-03-16 16:26:30,968] [INFO] Running command: hmmsearch --tblout OceanDNA-b27593/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg57252072-da36-4a03-b020-648a46794e17/dqc_reference/reference_markers.hmm OceanDNA-b27593/protein.faa > /dev/null [2023-03-16 16:26:31,165] [INFO] Task succeeded: HMMsearch [2023-03-16 16:26:31,166] [WARNING] Found 4/6 markers. [/var/lib/cwl/stga8f81936-7d8d-40fc-9818-479c6b084bf5/OceanDNA-b27593.fa] [2023-03-16 16:26:31,193] [INFO] Query marker FASTA was written to OceanDNA-b27593/markers.fasta [2023-03-16 16:26:31,193] [INFO] Task started: Blastn [2023-03-16 16:26:31,193] [INFO] Running command: blastn -query OceanDNA-b27593/markers.fasta -db /var/lib/cwl/stg57252072-da36-4a03-b020-648a46794e17/dqc_reference/reference_markers.fasta -out OceanDNA-b27593/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-03-16 16:26:31,670] [INFO] Task succeeded: Blastn [2023-03-16 16:26:31,671] [INFO] Selected 20 target genomes. [2023-03-16 16:26:31,671] [INFO] Target genome list was writen to OceanDNA-b27593/target_genomes.txt [2023-03-16 16:26:31,707] [INFO] Task started: fastANI [2023-03-16 16:26:31,707] [INFO] Running command: fastANI --query /var/lib/cwl/stga8f81936-7d8d-40fc-9818-479c6b084bf5/OceanDNA-b27593.fa --refList OceanDNA-b27593/target_genomes.txt --output OceanDNA-b27593/fastani_result.tsv --threads 1 [2023-03-16 16:26:44,661] [INFO] Task succeeded: fastANI [2023-03-16 16:26:44,661] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg57252072-da36-4a03-b020-648a46794e17/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2023-03-16 16:26:44,661] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg57252072-da36-4a03-b020-648a46794e17/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2023-03-16 16:26:44,673] [INFO] Found 20 fastANI hits (0 hits with ANI > threshold) [2023-03-16 16:26:44,673] [INFO] The taxonomy check result is classified as 'below_threshold'. [2023-03-16 16:26:44,673] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Brevirhabdus pacifica strain=22DY15 GCA_002094875.1 1267768 1267768 type True 76.6097 93 885 95 below_threshold Ruegeria intermedia strain=DSM 29341 GCA_900129345.1 996115 996115 type True 76.6047 131 885 95 below_threshold Kangsaoukella pontilimi strain=GH1-50 GCA_009830125.1 2691042 2691042 type True 76.5982 105 885 95 below_threshold Brevirhabdus pacifica strain=DSM 27767 GCA_002797755.1 1267768 1267768 type True 76.5767 109 885 95 below_threshold Ruegeria marisrubri strain=ZGT118 GCA_001507595.1 1685379 1685379 type True 76.5509 128 885 95 below_threshold Pontivivens ytuae strain=MT2928 GCA_015679265.1 2789856 2789856 type True 76.3802 122 885 95 below_threshold Jannaschia seohaensis strain=DSM 25227 GCA_900116765.1 475081 475081 type True 76.1814 131 885 95 below_threshold Jannaschia seohaensis strain=DSM 25227 GCA_003149265.1 475081 475081 type True 76.1814 131 885 95 below_threshold Cereibacter changlensis strain=JA139 GCA_003034985.1 402884 402884 type True 76.1708 143 885 95 below_threshold Thioclava electrotropha strain=Elox9 GCA_002085925.2 1549850 1549850 type True 76.1343 87 885 95 below_threshold Cereibacter changlensis strain=DSM 18774 GCA_003254335.1 402884 402884 type True 76.1184 149 885 95 below_threshold Paroceanicella profunda strain=D4M1 GCA_005887635.2 2579971 2579971 type True 76.1043 143 885 95 below_threshold Mangrovicoccus ximenensis strain=T1lg56 GCA_003056725.1 1911570 1911570 type True 76.027 129 885 95 below_threshold Cereibacter johrii strain=JA192 GCA_001720585.1 445629 445629 type True 76.012 117 885 95 below_threshold Cereibacter johrii strain=JA192 GCA_003046325.1 445629 445629 type True 76.0106 119 885 95 below_threshold Paracoccus acridae strain=CGMCC 1.15419 GCA_014642735.1 1795310 1795310 type True 75.9766 107 885 95 below_threshold Rubrimonas cliftonensis strain=DSM 15345 GCA_900107585.1 89524 89524 type True 75.8924 168 885 95 below_threshold Rubellimicrobium aerolatum strain=DSM 19297 GCA_017872975.1 490979 490979 type True 75.8269 102 885 95 below_threshold Rhodobacter tardus strain=CYK-10 GCA_009925085.1 2699202 2699202 type True 75.6813 84 885 95 below_threshold Kaistia hirudinis strain=DSM 25966 GCA_014196455.1 1293440 1293440 type True 75.5053 78 885 95 below_threshold -------------------------------------------------------------------------------- [2023-03-16 16:26:44,673] [INFO] DFAST Taxonomy check result was written to OceanDNA-b27593/tc_result.tsv [2023-03-16 16:26:44,673] [INFO] ===== Taxonomy check completed ===== [2023-03-16 16:26:44,673] [INFO] ===== Start completeness check using CheckM ===== [2023-03-16 16:26:44,673] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg57252072-da36-4a03-b020-648a46794e17/dqc_reference/checkm_data [2023-03-16 16:26:44,674] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2023-03-16 16:26:44,680] [INFO] Task started: CheckM [2023-03-16 16:26:44,680] [INFO] Running command: checkm taxonomy_wf --tab_table -f OceanDNA-b27593/cc_result.tsv -t 1 life "Prokaryote" OceanDNA-b27593/checkm_input OceanDNA-b27593/checkm_result [2023-03-16 16:27:44,558] [INFO] Task succeeded: CheckM [2023-03-16 16:27:44,558] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 70.43% Contamintation: 0.00% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2023-03-16 16:27:44,563] [INFO] ===== Completeness check finished ===== [2023-03-16 16:27:44,563] [INFO] ===== Start GTDB Search ===== [2023-03-16 16:27:44,563] [INFO] Query marker FASTA already exists. Will reuse it. (OceanDNA-b27593/markers.fasta) [2023-03-16 16:27:44,565] [INFO] Task started: Blastn [2023-03-16 16:27:44,565] [INFO] Running command: blastn -query OceanDNA-b27593/markers.fasta -db /var/lib/cwl/stg57252072-da36-4a03-b020-648a46794e17/dqc_reference/reference_markers_gtdb.fasta -out OceanDNA-b27593/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-03-16 16:27:46,221] [INFO] Task succeeded: Blastn [2023-03-16 16:27:46,222] [INFO] Selected 20 target genomes. [2023-03-16 16:27:46,222] [INFO] Target genome list was writen to OceanDNA-b27593/target_genomes_gtdb.txt [2023-03-16 16:27:46,539] [INFO] Task started: fastANI [2023-03-16 16:27:46,539] [INFO] Running command: fastANI --query /var/lib/cwl/stga8f81936-7d8d-40fc-9818-479c6b084bf5/OceanDNA-b27593.fa --refList OceanDNA-b27593/target_genomes_gtdb.txt --output OceanDNA-b27593/fastani_result_gtdb.tsv --threads 1 [2023-03-16 16:28:08,497] [INFO] Task succeeded: fastANI [2023-03-16 16:28:08,509] [INFO] Found 20 fastANI hits (0 hits with ANI > circumscription radius) [2023-03-16 16:28:08,509] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCF_900129345.1 s__Ruegeria intermedia 76.5889 132 885 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Ruegeria 95.0 N/A N/A N/A N/A 1 - GCA_009830125.1 s__GH1-50 sp009830125 76.5786 106 885 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__GH1-50 95.0 N/A N/A N/A N/A 1 - GCF_002797755.1 s__Brevirhabdus pacifica 76.5767 109 885 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Brevirhabdus 95.0 99.99 99.97 0.99 0.98 4 - GCF_001507595.1 s__Ruegeria marisrubri 76.5509 128 885 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Ruegeria 95.0 N/A N/A N/A N/A 1 - GCF_019104745.1 s__CAU-1522 sp019104745 76.5397 140 885 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__CAU-1522 95.0 N/A N/A N/A N/A 1 - GCA_011620265.1 s__Albidovulum sp011620265 76.4291 138 885 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Albidovulum 95.0 N/A N/A N/A N/A 1 - GCF_015679265.1 s__MT2928 sp015679265 76.3802 122 885 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__MT2928 95.0 N/A N/A N/A N/A 1 - GCF_000018145.1 s__Dinoroseobacter shibae 76.3208 103 885 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Dinoroseobacter 95.0 N/A N/A N/A N/A 1 - GCF_003008555.2 s__Pukyongiella litopenaei 76.232 148 885 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Pukyongiella 95.0 N/A N/A N/A N/A 1 - GCF_002814095.1 s__Sagittula sp002814095 76.1912 129 885 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Sagittula 95.0 N/A N/A N/A N/A 1 - GCF_003034985.1 s__Cereibacter changlensis 76.1832 142 885 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Cereibacter 95.0 98.07 96.18 0.89 0.78 3 - GCF_005887635.2 s__Paroceanicella profunda 76.1259 141 885 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Paroceanicella 95.0 N/A N/A N/A N/A 1 - GCF_002085925.2 s__Thioclava electrotropha 76.0904 88 885 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Thioclava 96.2561 N/A N/A N/A N/A 1 - GCF_009296265.1 s__Thioclava sp009296265 76.0509 101 885 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Thioclava 96.2474 N/A N/A N/A N/A 1 - GCF_001720585.1 s__Cereibacter_A johrii 76.012 117 885 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Cereibacter_A 95.1736 98.42 97.86 0.94 0.91 6 - GCF_003056725.1 s__Mangrovicoccus ximenensis 75.993 132 885 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Mangrovicoccus 95.0 N/A N/A N/A N/A 1 - GCF_014642735.1 s__Paracoccus acridae 75.9766 107 885 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Paracoccus 95.0 97.80 97.80 0.86 0.86 2 - GCF_003390865.1 s__HLUCCA09 sp003390865 75.9143 85 885 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__HLUCCA09 95.0 N/A N/A N/A N/A 1 - GCA_009925085.1 s__CYK-10 sp009925085 75.6813 84 885 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__CYK-10 95.0 N/A N/A N/A N/A 1 - GCA_002298965.1 s__Pinisolibacter sp002298965 75.4788 75 885 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Ancalomicrobiaceae;g__Pinisolibacter 95.0 N/A N/A N/A N/A 1 - -------------------------------------------------------------------------------- [2023-03-16 16:28:08,509] [INFO] GTDB search result was written to OceanDNA-b27593/result_gtdb.tsv [2023-03-16 16:28:08,510] [INFO] ===== GTDB Search completed ===== [2023-03-16 16:28:08,512] [INFO] DFAST_QC result json was written to OceanDNA-b27593/dqc_result.json [2023-03-16 16:28:08,512] [INFO] DFAST_QC completed! [2023-03-16 16:28:08,512] [INFO] Total running time: 0h2m0s