[2023-06-30 03:56:18,725] [INFO] DFAST_QC pipeline started. [2023-06-30 03:56:18,732] [INFO] DFAST_QC version: 0.5.7 [2023-06-30 03:56:18,732] [INFO] DQC Reference Directory: /var/lib/cwl/stg68efcaac-c549-4a44-83f7-d6242ad74a53/dqc_reference [2023-06-30 03:56:19,980] [INFO] ===== Start taxonomy check using ANI ===== [2023-06-30 03:56:19,981] [INFO] Task started: Prodigal [2023-06-30 03:56:19,981] [INFO] Running command: gunzip -c /var/lib/cwl/stg8c8f8f9a-46e1-4d67-996c-b7c11afa176b/GCA_903830375.1_freshwater_MAG_---_LJ-5-9m_bin-0354_genomic.fna.gz | prodigal -d GCA_903830375.1_freshwater_MAG_---_LJ-5-9m_bin-0354_genomic.fna/cds.fna -a GCA_903830375.1_freshwater_MAG_---_LJ-5-9m_bin-0354_genomic.fna/protein.faa -g 11 -q > /dev/null [2023-06-30 03:56:29,195] [INFO] Task succeeded: Prodigal [2023-06-30 03:56:29,196] [INFO] Task started: HMMsearch [2023-06-30 03:56:29,196] [INFO] Running command: hmmsearch --tblout GCA_903830375.1_freshwater_MAG_---_LJ-5-9m_bin-0354_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg68efcaac-c549-4a44-83f7-d6242ad74a53/dqc_reference/reference_markers.hmm GCA_903830375.1_freshwater_MAG_---_LJ-5-9m_bin-0354_genomic.fna/protein.faa > /dev/null [2023-06-30 03:56:29,451] [INFO] Task succeeded: HMMsearch [2023-06-30 03:56:29,452] [INFO] Found 6/6 markers. [2023-06-30 03:56:29,483] [INFO] Query marker FASTA was written to GCA_903830375.1_freshwater_MAG_---_LJ-5-9m_bin-0354_genomic.fna/markers.fasta [2023-06-30 03:56:29,483] [INFO] Task started: Blastn [2023-06-30 03:56:29,484] [INFO] Running command: blastn -query GCA_903830375.1_freshwater_MAG_---_LJ-5-9m_bin-0354_genomic.fna/markers.fasta -db /var/lib/cwl/stg68efcaac-c549-4a44-83f7-d6242ad74a53/dqc_reference/reference_markers.fasta -out GCA_903830375.1_freshwater_MAG_---_LJ-5-9m_bin-0354_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-30 03:56:30,448] [INFO] Task succeeded: Blastn [2023-06-30 03:56:30,452] [INFO] Selected 19 target genomes. [2023-06-30 03:56:30,452] [INFO] Target genome list was writen to GCA_903830375.1_freshwater_MAG_---_LJ-5-9m_bin-0354_genomic.fna/target_genomes.txt [2023-06-30 03:56:30,453] [INFO] Task started: fastANI [2023-06-30 03:56:30,454] [INFO] Running command: fastANI --query /var/lib/cwl/stg8c8f8f9a-46e1-4d67-996c-b7c11afa176b/GCA_903830375.1_freshwater_MAG_---_LJ-5-9m_bin-0354_genomic.fna.gz --refList GCA_903830375.1_freshwater_MAG_---_LJ-5-9m_bin-0354_genomic.fna/target_genomes.txt --output GCA_903830375.1_freshwater_MAG_---_LJ-5-9m_bin-0354_genomic.fna/fastani_result.tsv --threads 1 [2023-06-30 03:56:44,547] [INFO] Task succeeded: fastANI [2023-06-30 03:56:44,548] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg68efcaac-c549-4a44-83f7-d6242ad74a53/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2023-06-30 03:56:44,548] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg68efcaac-c549-4a44-83f7-d6242ad74a53/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2023-06-30 03:56:44,568] [INFO] Found 19 fastANI hits (0 hits with ANI > threshold) [2023-06-30 03:56:44,568] [INFO] The taxonomy check result is classified as 'below_threshold'. [2023-06-30 03:56:44,569] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Sulfurisoma sediminicola strain=BSN1 GCA_003865015.1 1381557 1381557 type True 80.0104 405 979 95 below_threshold Sulfurisoma sediminicola strain=DSM 26916 GCA_003663955.1 1381557 1381557 type True 79.9529 402 979 95 below_threshold Thauera butanivorans strain=NBRC 103042 GCA_001591165.1 86174 86174 type True 78.3673 311 979 95 below_threshold Georgfuchsia toluolica strain=G5G6 GCA_907163265.1 424218 424218 type True 78.3669 277 979 95 below_threshold Sterolibacterium denitrificans strain=DSM 13999 GCA_001586935.1 157592 157592 type True 78.3228 115 979 95 below_threshold Denitratisoma oestradiolicum strain=DSM 16959 GCA_902813185.1 311182 311182 type True 78.2408 291 979 95 below_threshold Denitratisoma oestradiolicum strain=DSM 16959 GCA_007844305.1 311182 311182 type True 78.1752 284 979 95 below_threshold Azoarcus rhizosphaerae strain=CC-YHH848 GCA_004801305.1 2565932 2565932 type True 77.9804 300 979 95 below_threshold Thauera linaloolentis strain=DSM 12138 GCA_000621305.1 76112 76112 type True 77.9307 273 979 95 below_threshold Thauera propionica strain=KNDSS-Mac4 GCA_002245655.1 2019431 2019431 type True 77.8111 260 979 95 below_threshold Thauera phenolivorans strain=ZV1C GCA_001696715.1 1792543 1792543 type True 77.6412 279 979 95 below_threshold Aromatoleum tolulyticum strain=ATCC 51758 GCA_900156155.1 34027 34027 type True 77.5855 288 979 95 below_threshold Aromatoleum anaerobium strain=LuFRes1 GCA_012910705.2 182180 182180 type True 77.5515 243 979 95 below_threshold Aromatoleum toluolicum strain=T GCA_012911005.2 90060 90060 type True 77.5422 300 979 95 below_threshold Aromatoleum buckelii strain=U120 GCA_012910785.2 200254 200254 type True 77.5041 214 979 95 below_threshold Ralstonia pseudosolanacearum strain=LMG 9673 GCA_919586305.1 1310165 1310165 type True 77.1437 178 979 95 below_threshold Cupriavidus respiraculi strain=LMG 21510 GCA_914271545.1 195930 195930 type True 76.9272 191 979 95 below_threshold Pseudoduganella aquatica strain=FT127W GCA_009857595.1 2660641 2660641 type True 76.7114 214 979 95 below_threshold Cupriavidus numazuensis strain=LMG 26411 GCA_905397435.1 221992 221992 type True 76.6441 189 979 95 below_threshold -------------------------------------------------------------------------------- [2023-06-30 03:56:44,571] [INFO] DFAST Taxonomy check result was written to GCA_903830375.1_freshwater_MAG_---_LJ-5-9m_bin-0354_genomic.fna/tc_result.tsv [2023-06-30 03:56:44,572] [INFO] ===== Taxonomy check completed ===== [2023-06-30 03:56:44,572] [INFO] ===== Start completeness check using CheckM ===== [2023-06-30 03:56:44,572] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg68efcaac-c549-4a44-83f7-d6242ad74a53/dqc_reference/checkm_data [2023-06-30 03:56:44,574] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2023-06-30 03:56:44,615] [INFO] Task started: CheckM [2023-06-30 03:56:44,615] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCA_903830375.1_freshwater_MAG_---_LJ-5-9m_bin-0354_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCA_903830375.1_freshwater_MAG_---_LJ-5-9m_bin-0354_genomic.fna/checkm_input GCA_903830375.1_freshwater_MAG_---_LJ-5-9m_bin-0354_genomic.fna/checkm_result [2023-06-30 03:57:16,537] [INFO] Task succeeded: CheckM [2023-06-30 03:57:16,539] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 100.00% Contamintation: 0.00% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2023-06-30 03:57:16,564] [INFO] ===== Completeness check finished ===== [2023-06-30 03:57:16,564] [INFO] ===== Start GTDB Search ===== [2023-06-30 03:57:16,564] [INFO] Query marker FASTA already exists. Will reuse it. (GCA_903830375.1_freshwater_MAG_---_LJ-5-9m_bin-0354_genomic.fna/markers.fasta) [2023-06-30 03:57:16,565] [INFO] Task started: Blastn [2023-06-30 03:57:16,565] [INFO] Running command: blastn -query GCA_903830375.1_freshwater_MAG_---_LJ-5-9m_bin-0354_genomic.fna/markers.fasta -db /var/lib/cwl/stg68efcaac-c549-4a44-83f7-d6242ad74a53/dqc_reference/reference_markers_gtdb.fasta -out GCA_903830375.1_freshwater_MAG_---_LJ-5-9m_bin-0354_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-30 03:57:18,465] [INFO] Task succeeded: Blastn [2023-06-30 03:57:18,470] [INFO] Selected 11 target genomes. [2023-06-30 03:57:18,470] [INFO] Target genome list was writen to GCA_903830375.1_freshwater_MAG_---_LJ-5-9m_bin-0354_genomic.fna/target_genomes_gtdb.txt [2023-06-30 03:57:18,471] [INFO] Task started: fastANI [2023-06-30 03:57:18,471] [INFO] Running command: fastANI --query /var/lib/cwl/stg8c8f8f9a-46e1-4d67-996c-b7c11afa176b/GCA_903830375.1_freshwater_MAG_---_LJ-5-9m_bin-0354_genomic.fna.gz --refList GCA_903830375.1_freshwater_MAG_---_LJ-5-9m_bin-0354_genomic.fna/target_genomes_gtdb.txt --output GCA_903830375.1_freshwater_MAG_---_LJ-5-9m_bin-0354_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2023-06-30 03:57:27,001] [INFO] Task succeeded: fastANI [2023-06-30 03:57:27,014] [INFO] Found 11 fastANI hits (1 hits with ANI > circumscription radius) [2023-06-30 03:57:27,014] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCA_903848955.1 s__Sulfuritalea sp903848955 99.5742 894 979 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Rhodocyclaceae;g__Sulfuritalea 95.0 99.61 99.42 0.91 0.84 24 conclusive GCA_006218025.1 s__Sulfuritalea sp006218025 84.1338 634 979 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Rhodocyclaceae;g__Sulfuritalea 95.0 99.98 99.98 1.00 1.00 2 - GCF_000828635.1 s__Sulfuritalea hydrogenivorans 83.788 626 979 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Rhodocyclaceae;g__Sulfuritalea 95.0 N/A N/A N/A N/A 1 - GCA_016192435.1 s__Sulfuritalea sp016192435 83.4141 513 979 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Rhodocyclaceae;g__Sulfuritalea 95.0 N/A N/A N/A N/A 1 - GCA_016708585.1 s__Sulfuritalea sp016708585 83.2608 574 979 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Rhodocyclaceae;g__Sulfuritalea 95.0 N/A N/A N/A N/A 1 - GCA_001828935.1 s__Sulfuritalea sp001828935 82.9641 590 979 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Rhodocyclaceae;g__Sulfuritalea 95.0 N/A N/A N/A N/A 1 - GCA_903933395.1 s__Sulfuritalea sp903933395 82.8796 473 979 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Rhodocyclaceae;g__Sulfuritalea 95.0 99.21 99.21 0.82 0.82 2 - GCA_016712985.1 s__Sulfuritalea sp016712985 82.8181 611 979 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Rhodocyclaceae;g__Sulfuritalea 95.0 97.95 97.95 0.80 0.80 2 - GCA_012026675.1 s__Sulfuritalea sp012026675 82.4378 504 979 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Rhodocyclaceae;g__Sulfuritalea 95.0 N/A N/A N/A N/A 1 - GCA_016235805.1 s__Sulfuritalea sp016235805 81.9758 498 979 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Rhodocyclaceae;g__Sulfuritalea 95.0 N/A N/A N/A N/A 1 - GCA_903932755.1 s__Sulfuritalea sp903932755 81.2648 324 979 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Rhodocyclaceae;g__Sulfuritalea 95.0 98.78 98.78 0.75 0.75 2 - -------------------------------------------------------------------------------- [2023-06-30 03:57:27,016] [INFO] GTDB search result was written to GCA_903830375.1_freshwater_MAG_---_LJ-5-9m_bin-0354_genomic.fna/result_gtdb.tsv [2023-06-30 03:57:27,016] [INFO] ===== GTDB Search completed ===== [2023-06-30 03:57:27,020] [INFO] DFAST_QC result json was written to GCA_903830375.1_freshwater_MAG_---_LJ-5-9m_bin-0354_genomic.fna/dqc_result.json [2023-06-30 03:57:27,020] [INFO] DFAST_QC completed! [2023-06-30 03:57:27,020] [INFO] Total running time: 0h1m8s