[2024-01-24 13:30:11,270] [INFO] DFAST_QC pipeline started. [2024-01-24 13:30:11,272] [INFO] DFAST_QC version: 0.5.7 [2024-01-24 13:30:11,272] [INFO] DQC Reference Directory: /var/lib/cwl/stgb947a371-56a7-4ef3-a05f-8dba3afb7016/dqc_reference [2024-01-24 13:30:12,496] [INFO] ===== Start taxonomy check using ANI ===== [2024-01-24 13:30:12,497] [INFO] Task started: Prodigal [2024-01-24 13:30:12,498] [INFO] Running command: gunzip -c /var/lib/cwl/stgd2d7e9f4-0b87-4a9e-810a-0bd8a8816394/GCF_900113875.1_IMG-taxon_2622736536_annotated_assembly_genomic.fna.gz | prodigal -d GCF_900113875.1_IMG-taxon_2622736536_annotated_assembly_genomic.fna/cds.fna -a GCF_900113875.1_IMG-taxon_2622736536_annotated_assembly_genomic.fna/protein.faa -g 11 -q > /dev/null [2024-01-24 13:30:23,712] [INFO] Task succeeded: Prodigal [2024-01-24 13:30:23,713] [INFO] Task started: HMMsearch [2024-01-24 13:30:23,713] [INFO] Running command: hmmsearch --tblout GCF_900113875.1_IMG-taxon_2622736536_annotated_assembly_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stgb947a371-56a7-4ef3-a05f-8dba3afb7016/dqc_reference/reference_markers.hmm GCF_900113875.1_IMG-taxon_2622736536_annotated_assembly_genomic.fna/protein.faa > /dev/null [2024-01-24 13:30:23,981] [INFO] Task succeeded: HMMsearch [2024-01-24 13:30:23,982] [INFO] Found 6/6 markers. [2024-01-24 13:30:24,018] [INFO] Query marker FASTA was written to GCF_900113875.1_IMG-taxon_2622736536_annotated_assembly_genomic.fna/markers.fasta [2024-01-24 13:30:24,018] [INFO] Task started: Blastn [2024-01-24 13:30:24,018] [INFO] Running command: blastn -query GCF_900113875.1_IMG-taxon_2622736536_annotated_assembly_genomic.fna/markers.fasta -db /var/lib/cwl/stgb947a371-56a7-4ef3-a05f-8dba3afb7016/dqc_reference/reference_markers.fasta -out GCF_900113875.1_IMG-taxon_2622736536_annotated_assembly_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-24 13:30:25,049] [INFO] Task succeeded: Blastn [2024-01-24 13:30:25,052] [INFO] Selected 16 target genomes. [2024-01-24 13:30:25,052] [INFO] Target genome list was writen to GCF_900113875.1_IMG-taxon_2622736536_annotated_assembly_genomic.fna/target_genomes.txt [2024-01-24 13:30:25,063] [INFO] Task started: fastANI [2024-01-24 13:30:25,064] [INFO] Running command: fastANI --query /var/lib/cwl/stgd2d7e9f4-0b87-4a9e-810a-0bd8a8816394/GCF_900113875.1_IMG-taxon_2622736536_annotated_assembly_genomic.fna.gz --refList GCF_900113875.1_IMG-taxon_2622736536_annotated_assembly_genomic.fna/target_genomes.txt --output GCF_900113875.1_IMG-taxon_2622736536_annotated_assembly_genomic.fna/fastani_result.tsv --threads 1 [2024-01-24 13:30:37,226] [INFO] Task succeeded: fastANI [2024-01-24 13:30:37,226] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stgb947a371-56a7-4ef3-a05f-8dba3afb7016/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2024-01-24 13:30:37,227] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stgb947a371-56a7-4ef3-a05f-8dba3afb7016/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2024-01-24 13:30:37,240] [INFO] Found 16 fastANI hits (1 hits with ANI > threshold) [2024-01-24 13:30:37,241] [INFO] The taxonomy check result is classified as 'conclusive'. [2024-01-24 13:30:37,241] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Jannaschia pohangensis strain=DSM 19073 GCA_900113875.1 390807 390807 type True 100.0 1236 1236 95 conclusive Jannaschia rubra strain=CECT 5088 GCA_001403735.1 282197 282197 type True 80.0133 631 1236 95 below_threshold Jannaschia rubra strain=DSM 16279 GCA_900113265.1 282197 282197 type True 79.978 623 1236 95 below_threshold Jannaschia marina strain=SHC163 GCA_013404595.1 2741674 2741674 type True 79.8645 701 1236 95 below_threshold Jannaschia helgolandensis strain=DSM 14858 GCA_900109285.1 188906 188906 type True 79.3826 593 1236 95 below_threshold Jannaschia formosa strain=12N15 GCA_003340555.1 2259592 2259592 type True 79.3571 648 1236 95 below_threshold Jannaschia faecimaris strain=DSM 100420 GCA_900107415.1 1244108 1244108 type True 79.1901 576 1236 95 below_threshold Jannaschia seohaensis strain=DSM 25227 GCA_003149265.1 475081 475081 type True 79.1085 569 1236 95 below_threshold Jannaschia seohaensis strain=DSM 25227 GCA_900116765.1 475081 475081 type True 79.1085 569 1236 95 below_threshold Alexandriicola marinus strain=LZ-14 GCA_004000435.1 2081710 2081710 type True 77.8228 300 1236 95 below_threshold Rhodobacter amnigenus strain=HSP-20 GCA_009908265.2 2852097 2852097 type True 77.6715 323 1236 95 below_threshold Rhodobacter calidifons strain=M37P GCA_011174775.1 2715277 2715277 type True 77.5015 330 1236 95 below_threshold Pseudoroseicyclus tamaricis strain=CLL3-39 GCA_010435925.1 2705421 2705421 type True 77.4049 284 1236 95 below_threshold Hasllibacter halocynthiae strain=DSM 29318 GCA_003003095.1 595589 595589 type True 77.3569 258 1236 95 below_threshold Gemmobacter fulva strain=con5 GCA_018798885.1 2840474 2840474 type True 77.317 278 1236 95 below_threshold Tabrizicola sediminis strain=DRYC-M-16 GCA_004745575.1 2486418 2486418 type True 77.0009 267 1236 95 below_threshold -------------------------------------------------------------------------------- [2024-01-24 13:30:37,243] [INFO] DFAST Taxonomy check result was written to GCF_900113875.1_IMG-taxon_2622736536_annotated_assembly_genomic.fna/tc_result.tsv [2024-01-24 13:30:37,243] [INFO] ===== Taxonomy check completed ===== [2024-01-24 13:30:37,244] [INFO] ===== Start completeness check using CheckM ===== [2024-01-24 13:30:37,244] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stgb947a371-56a7-4ef3-a05f-8dba3afb7016/dqc_reference/checkm_data [2024-01-24 13:30:37,245] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2024-01-24 13:30:37,283] [INFO] Task started: CheckM [2024-01-24 13:30:37,284] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCF_900113875.1_IMG-taxon_2622736536_annotated_assembly_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCF_900113875.1_IMG-taxon_2622736536_annotated_assembly_genomic.fna/checkm_input GCF_900113875.1_IMG-taxon_2622736536_annotated_assembly_genomic.fna/checkm_result [2024-01-24 13:31:18,322] [INFO] Task succeeded: CheckM [2024-01-24 13:31:18,323] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 100.00% Contamintation: 0.00% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2024-01-24 13:31:18,345] [INFO] ===== Completeness check finished ===== [2024-01-24 13:31:18,346] [INFO] ===== Start GTDB Search ===== [2024-01-24 13:31:18,346] [INFO] Query marker FASTA already exists. Will reuse it. (GCF_900113875.1_IMG-taxon_2622736536_annotated_assembly_genomic.fna/markers.fasta) [2024-01-24 13:31:18,346] [INFO] Task started: Blastn [2024-01-24 13:31:18,347] [INFO] Running command: blastn -query GCF_900113875.1_IMG-taxon_2622736536_annotated_assembly_genomic.fna/markers.fasta -db /var/lib/cwl/stgb947a371-56a7-4ef3-a05f-8dba3afb7016/dqc_reference/reference_markers_gtdb.fasta -out GCF_900113875.1_IMG-taxon_2622736536_annotated_assembly_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-24 13:31:20,025] [INFO] Task succeeded: Blastn [2024-01-24 13:31:20,028] [INFO] Selected 11 target genomes. [2024-01-24 13:31:20,028] [INFO] Target genome list was writen to GCF_900113875.1_IMG-taxon_2622736536_annotated_assembly_genomic.fna/target_genomes_gtdb.txt [2024-01-24 13:31:20,051] [INFO] Task started: fastANI [2024-01-24 13:31:20,051] [INFO] Running command: fastANI --query /var/lib/cwl/stgd2d7e9f4-0b87-4a9e-810a-0bd8a8816394/GCF_900113875.1_IMG-taxon_2622736536_annotated_assembly_genomic.fna.gz --refList GCF_900113875.1_IMG-taxon_2622736536_annotated_assembly_genomic.fna/target_genomes_gtdb.txt --output GCF_900113875.1_IMG-taxon_2622736536_annotated_assembly_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2024-01-24 13:31:28,679] [INFO] Task succeeded: fastANI [2024-01-24 13:31:28,692] [INFO] Found 11 fastANI hits (1 hits with ANI > circumscription radius) [2024-01-24 13:31:28,692] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCF_900113875.1 s__Jannaschia pohangensis 100.0 1236 1236 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Jannaschia 95.0 N/A N/A N/A N/A 1 conclusive GCF_001403735.1 s__Jannaschia rubra 80.0318 629 1236 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Jannaschia 95.0 99.99 99.99 0.99 0.99 2 - GCF_001403795.1 s__Jannaschia donghaensis 79.9677 681 1236 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Jannaschia 95.0 N/A N/A N/A N/A 1 - GCF_013404595.1 s__Jannaschia marina 79.8587 701 1236 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Jannaschia 95.0 N/A N/A N/A N/A 1 - GCF_900109285.1 s__Jannaschia helgolandensis 79.3792 594 1236 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Jannaschia 95.0 N/A N/A N/A N/A 1 - GCF_003340555.1 s__Jannaschia formosa 79.3576 648 1236 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Jannaschia 95.0 N/A N/A N/A N/A 1 - GCF_900107415.1 s__Jannaschia faecimaris 79.1822 577 1236 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Jannaschia 95.0 N/A N/A N/A N/A 1 - GCF_900116765.1 s__Jannaschia seohaensis 79.1004 570 1236 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Jannaschia 95.0 100.00 100.00 1.00 1.00 2 - GCF_018139985.1 s__JAGSOU01 sp018139985 77.9934 386 1236 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__JAGSOU01 95.0 N/A N/A N/A N/A 1 - GCA_004005435.1 s__CCMM004 sp004005435 77.7763 373 1236 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__CCMM004 95.0 N/A N/A N/A N/A 1 - GCF_003003095.1 s__Hasllibacter halocynthiae 77.3569 258 1236 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Hasllibacter 95.0 N/A N/A N/A N/A 1 - -------------------------------------------------------------------------------- [2024-01-24 13:31:28,694] [INFO] GTDB search result was written to GCF_900113875.1_IMG-taxon_2622736536_annotated_assembly_genomic.fna/result_gtdb.tsv [2024-01-24 13:31:28,694] [INFO] ===== GTDB Search completed ===== [2024-01-24 13:31:28,698] [INFO] DFAST_QC result json was written to GCF_900113875.1_IMG-taxon_2622736536_annotated_assembly_genomic.fna/dqc_result.json [2024-01-24 13:31:28,698] [INFO] DFAST_QC completed! [2024-01-24 13:31:28,698] [INFO] Total running time: 0h1m17s