[2024-01-25 19:58:35,667] [INFO] DFAST_QC pipeline started. [2024-01-25 19:58:35,668] [INFO] DFAST_QC version: 0.5.7 [2024-01-25 19:58:35,668] [INFO] DQC Reference Directory: /var/lib/cwl/stge33faeae-640f-45f7-b6fc-2baabefdd002/dqc_reference [2024-01-25 19:58:36,794] [INFO] ===== Start taxonomy check using ANI ===== [2024-01-25 19:58:36,795] [INFO] Task started: Prodigal [2024-01-25 19:58:36,795] [INFO] Running command: gunzip -c /var/lib/cwl/stg9fbdd83a-6cfe-4c27-b319-5eacf4416c39/GCF_001403735.1_J.rubraCECT5088_SeqMan_Prokka_genomic.fna.gz | prodigal -d GCF_001403735.1_J.rubraCECT5088_SeqMan_Prokka_genomic.fna/cds.fna -a GCF_001403735.1_J.rubraCECT5088_SeqMan_Prokka_genomic.fna/protein.faa -g 11 -q > /dev/null [2024-01-25 19:58:46,238] [INFO] Task succeeded: Prodigal [2024-01-25 19:58:46,238] [INFO] Task started: HMMsearch [2024-01-25 19:58:46,238] [INFO] Running command: hmmsearch --tblout GCF_001403735.1_J.rubraCECT5088_SeqMan_Prokka_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stge33faeae-640f-45f7-b6fc-2baabefdd002/dqc_reference/reference_markers.hmm GCF_001403735.1_J.rubraCECT5088_SeqMan_Prokka_genomic.fna/protein.faa > /dev/null [2024-01-25 19:58:46,456] [INFO] Task succeeded: HMMsearch [2024-01-25 19:58:46,457] [INFO] Found 6/6 markers. [2024-01-25 19:58:46,489] [INFO] Query marker FASTA was written to GCF_001403735.1_J.rubraCECT5088_SeqMan_Prokka_genomic.fna/markers.fasta [2024-01-25 19:58:46,489] [INFO] Task started: Blastn [2024-01-25 19:58:46,489] [INFO] Running command: blastn -query GCF_001403735.1_J.rubraCECT5088_SeqMan_Prokka_genomic.fna/markers.fasta -db /var/lib/cwl/stge33faeae-640f-45f7-b6fc-2baabefdd002/dqc_reference/reference_markers.fasta -out GCF_001403735.1_J.rubraCECT5088_SeqMan_Prokka_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-25 19:58:47,409] [INFO] Task succeeded: Blastn [2024-01-25 19:58:47,412] [INFO] Selected 16 target genomes. [2024-01-25 19:58:47,412] [INFO] Target genome list was writen to GCF_001403735.1_J.rubraCECT5088_SeqMan_Prokka_genomic.fna/target_genomes.txt [2024-01-25 19:58:47,422] [INFO] Task started: fastANI [2024-01-25 19:58:47,422] [INFO] Running command: fastANI --query /var/lib/cwl/stg9fbdd83a-6cfe-4c27-b319-5eacf4416c39/GCF_001403735.1_J.rubraCECT5088_SeqMan_Prokka_genomic.fna.gz --refList GCF_001403735.1_J.rubraCECT5088_SeqMan_Prokka_genomic.fna/target_genomes.txt --output GCF_001403735.1_J.rubraCECT5088_SeqMan_Prokka_genomic.fna/fastani_result.tsv --threads 1 [2024-01-25 19:59:01,048] [INFO] Task succeeded: fastANI [2024-01-25 19:59:01,048] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stge33faeae-640f-45f7-b6fc-2baabefdd002/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2024-01-25 19:59:01,048] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stge33faeae-640f-45f7-b6fc-2baabefdd002/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2024-01-25 19:59:01,061] [INFO] Found 16 fastANI hits (2 hits with ANI > threshold) [2024-01-25 19:59:01,061] [INFO] The taxonomy check result is classified as 'conclusive'. [2024-01-25 19:59:01,061] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Jannaschia rubra strain=CECT 5088 GCA_001403735.1 282197 282197 type True 100.0 1208 1209 95 conclusive Jannaschia rubra strain=DSM 16279 GCA_900113265.1 282197 282197 type True 99.9893 1164 1209 95 conclusive Jannaschia formosa strain=12N15 GCA_003340555.1 2259592 2259592 type True 80.1042 664 1209 95 below_threshold Jannaschia pohangensis strain=DSM 19073 GCA_900113875.1 390807 390807 type True 80.0371 633 1209 95 below_threshold Jannaschia marina strain=SHC163 GCA_013404595.1 2741674 2741674 type True 80.0343 650 1209 95 below_threshold Jannaschia helgolandensis strain=DSM 14858 GCA_900109285.1 188906 188906 type True 79.2922 571 1209 95 below_threshold Jannaschia seohaensis strain=DSM 25227 GCA_900116765.1 475081 475081 type True 79.1804 570 1209 95 below_threshold Jannaschia seohaensis strain=DSM 25227 GCA_003149265.1 475081 475081 type True 79.1769 571 1209 95 below_threshold Palleronia rufa strain=MOLA 401 GCA_000743715.1 1530186 1530186 type True 78.5181 384 1209 95 below_threshold Wenxinia saemankumensis strain=DSM 100565 GCA_900141735.1 1447782 1447782 type True 77.8687 411 1209 95 below_threshold Paracoccus gahaiensis strain=KCTC 42687 GCA_005048225.1 1706839 1706839 type True 77.7592 371 1209 95 below_threshold Cereibacter ovatus strain=JA234 GCA_900207575.1 439529 439529 type True 77.5973 342 1209 95 below_threshold Rhodobacter amnigenus strain=HSP-20 GCA_019130055.1 2852097 2852097 type True 77.4503 345 1209 95 below_threshold Rhodobacter amnigenus strain=HSP-20 GCA_009908265.2 2852097 2852097 type True 77.4421 346 1209 95 below_threshold Rhodobacter ruber strain=CCP-1 GCA_009908315.1 1985673 1985673 type True 77.1102 321 1209 95 below_threshold Phaeovulum veldkampii strain=DSM 11550 GCA_003034995.1 33049 33049 type True 76.9172 251 1209 95 below_threshold -------------------------------------------------------------------------------- [2024-01-25 19:59:01,065] [INFO] DFAST Taxonomy check result was written to GCF_001403735.1_J.rubraCECT5088_SeqMan_Prokka_genomic.fna/tc_result.tsv [2024-01-25 19:59:01,066] [INFO] ===== Taxonomy check completed ===== [2024-01-25 19:59:01,067] [INFO] ===== Start completeness check using CheckM ===== [2024-01-25 19:59:01,067] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stge33faeae-640f-45f7-b6fc-2baabefdd002/dqc_reference/checkm_data [2024-01-25 19:59:01,068] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2024-01-25 19:59:01,116] [INFO] Task started: CheckM [2024-01-25 19:59:01,117] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCF_001403735.1_J.rubraCECT5088_SeqMan_Prokka_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCF_001403735.1_J.rubraCECT5088_SeqMan_Prokka_genomic.fna/checkm_input GCF_001403735.1_J.rubraCECT5088_SeqMan_Prokka_genomic.fna/checkm_result [2024-01-25 19:59:39,161] [INFO] Task succeeded: CheckM [2024-01-25 19:59:39,162] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 100.00% Contamintation: 0.00% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2024-01-25 19:59:39,214] [INFO] ===== Completeness check finished ===== [2024-01-25 19:59:39,214] [INFO] ===== Start GTDB Search ===== [2024-01-25 19:59:39,215] [INFO] Query marker FASTA already exists. Will reuse it. (GCF_001403735.1_J.rubraCECT5088_SeqMan_Prokka_genomic.fna/markers.fasta) [2024-01-25 19:59:39,215] [INFO] Task started: Blastn [2024-01-25 19:59:39,215] [INFO] Running command: blastn -query GCF_001403735.1_J.rubraCECT5088_SeqMan_Prokka_genomic.fna/markers.fasta -db /var/lib/cwl/stge33faeae-640f-45f7-b6fc-2baabefdd002/dqc_reference/reference_markers_gtdb.fasta -out GCF_001403735.1_J.rubraCECT5088_SeqMan_Prokka_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-25 19:59:41,174] [INFO] Task succeeded: Blastn [2024-01-25 19:59:41,177] [INFO] Selected 13 target genomes. [2024-01-25 19:59:41,177] [INFO] Target genome list was writen to GCF_001403735.1_J.rubraCECT5088_SeqMan_Prokka_genomic.fna/target_genomes_gtdb.txt [2024-01-25 19:59:41,196] [INFO] Task started: fastANI [2024-01-25 19:59:41,196] [INFO] Running command: fastANI --query /var/lib/cwl/stg9fbdd83a-6cfe-4c27-b319-5eacf4416c39/GCF_001403735.1_J.rubraCECT5088_SeqMan_Prokka_genomic.fna.gz --refList GCF_001403735.1_J.rubraCECT5088_SeqMan_Prokka_genomic.fna/target_genomes_gtdb.txt --output GCF_001403735.1_J.rubraCECT5088_SeqMan_Prokka_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2024-01-25 19:59:51,729] [INFO] Task succeeded: fastANI [2024-01-25 19:59:51,737] [INFO] Found 13 fastANI hits (1 hits with ANI > circumscription radius) [2024-01-25 19:59:51,738] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCF_001403735.1 s__Jannaschia rubra 100.0 1208 1209 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Jannaschia 95.0 99.99 99.99 0.99 0.99 2 conclusive GCF_003340555.1 s__Jannaschia formosa 80.1054 663 1209 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Jannaschia 95.0 N/A N/A N/A N/A 1 - GCF_013404595.1 s__Jannaschia marina 80.0474 648 1209 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Jannaschia 95.0 N/A N/A N/A N/A 1 - GCF_900113875.1 s__Jannaschia pohangensis 80.0423 632 1209 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Jannaschia 95.0 N/A N/A N/A N/A 1 - GCF_001408515.1 s__Jannaschia seosinensis 79.8482 532 1209 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Jannaschia 95.0 N/A N/A N/A N/A 1 - GCF_001403795.1 s__Jannaschia donghaensis 79.5055 589 1209 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Jannaschia 95.0 N/A N/A N/A N/A 1 - GCF_900109285.1 s__Jannaschia helgolandensis 79.2923 571 1209 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Jannaschia 95.0 N/A N/A N/A N/A 1 - GCF_900116765.1 s__Jannaschia seohaensis 79.1884 569 1209 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Jannaschia 95.0 100.00 100.00 1.00 1.00 2 - GCF_900107415.1 s__Jannaschia faecimaris 78.6184 514 1209 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Jannaschia 95.0 N/A N/A N/A N/A 1 - GCF_900141735.1 s__Wenxinia saemankumensis 77.8468 413 1209 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Wenxinia 95.0 N/A N/A N/A N/A 1 - GCF_005048225.1 s__Paracoccus gahaiensis 77.7573 372 1209 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Paracoccus 95.0 N/A N/A N/A N/A 1 - GCA_004005435.1 s__CCMM004 sp004005435 77.7465 442 1209 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__CCMM004 95.0 N/A N/A N/A N/A 1 - GCF_003034995.1 s__Phaeovulum veldkampii 76.9189 252 1209 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Phaeovulum 95.0 99.98 99.97 0.97 0.94 3 - -------------------------------------------------------------------------------- [2024-01-25 19:59:51,739] [INFO] GTDB search result was written to GCF_001403735.1_J.rubraCECT5088_SeqMan_Prokka_genomic.fna/result_gtdb.tsv [2024-01-25 19:59:51,740] [INFO] ===== GTDB Search completed ===== [2024-01-25 19:59:51,746] [INFO] DFAST_QC result json was written to GCF_001403735.1_J.rubraCECT5088_SeqMan_Prokka_genomic.fna/dqc_result.json [2024-01-25 19:59:51,746] [INFO] DFAST_QC completed! [2024-01-25 19:59:51,746] [INFO] Total running time: 0h1m16s