[2024-01-24 13:31:35,239] [INFO] DFAST_QC pipeline started. [2024-01-24 13:31:35,246] [INFO] DFAST_QC version: 0.5.7 [2024-01-24 13:31:35,246] [INFO] DQC Reference Directory: /var/lib/cwl/stg2b6dff62-342c-477a-905c-246842e9b38d/dqc_reference [2024-01-24 13:31:36,737] [INFO] ===== Start taxonomy check using ANI ===== [2024-01-24 13:31:36,738] [INFO] Task started: Prodigal [2024-01-24 13:31:36,738] [INFO] Running command: gunzip -c /var/lib/cwl/stgd1454351-ac54-49b6-8530-52ceb83cd8ed/GCF_018599245.1_ASM1859924v1_genomic.fna.gz | prodigal -d GCF_018599245.1_ASM1859924v1_genomic.fna/cds.fna -a GCF_018599245.1_ASM1859924v1_genomic.fna/protein.faa -g 11 -q > /dev/null [2024-01-24 13:31:48,490] [INFO] Task succeeded: Prodigal [2024-01-24 13:31:48,491] [INFO] Task started: HMMsearch [2024-01-24 13:31:48,491] [INFO] Running command: hmmsearch --tblout GCF_018599245.1_ASM1859924v1_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg2b6dff62-342c-477a-905c-246842e9b38d/dqc_reference/reference_markers.hmm GCF_018599245.1_ASM1859924v1_genomic.fna/protein.faa > /dev/null [2024-01-24 13:31:48,827] [INFO] Task succeeded: HMMsearch [2024-01-24 13:31:48,828] [INFO] Found 6/6 markers. [2024-01-24 13:31:48,862] [INFO] Query marker FASTA was written to GCF_018599245.1_ASM1859924v1_genomic.fna/markers.fasta [2024-01-24 13:31:48,862] [INFO] Task started: Blastn [2024-01-24 13:31:48,862] [INFO] Running command: blastn -query GCF_018599245.1_ASM1859924v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg2b6dff62-342c-477a-905c-246842e9b38d/dqc_reference/reference_markers.fasta -out GCF_018599245.1_ASM1859924v1_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-24 13:31:49,698] [INFO] Task succeeded: Blastn [2024-01-24 13:31:49,703] [INFO] Selected 21 target genomes. [2024-01-24 13:31:49,704] [INFO] Target genome list was writen to GCF_018599245.1_ASM1859924v1_genomic.fna/target_genomes.txt [2024-01-24 13:31:49,717] [INFO] Task started: fastANI [2024-01-24 13:31:49,717] [INFO] Running command: fastANI --query /var/lib/cwl/stgd1454351-ac54-49b6-8530-52ceb83cd8ed/GCF_018599245.1_ASM1859924v1_genomic.fna.gz --refList GCF_018599245.1_ASM1859924v1_genomic.fna/target_genomes.txt --output GCF_018599245.1_ASM1859924v1_genomic.fna/fastani_result.tsv --threads 1 [2024-01-24 13:32:05,238] [INFO] Task succeeded: fastANI [2024-01-24 13:32:05,239] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg2b6dff62-342c-477a-905c-246842e9b38d/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2024-01-24 13:32:05,239] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg2b6dff62-342c-477a-905c-246842e9b38d/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2024-01-24 13:32:05,255] [INFO] Found 21 fastANI hits (0 hits with ANI > threshold) [2024-01-24 13:32:05,255] [INFO] The taxonomy check result is classified as 'below_threshold'. [2024-01-24 13:32:05,256] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Aliiroseovarius crassostreae strain=CV919-312 GCA_001307765.1 154981 154981 type True 78.7736 506 1106 95 below_threshold Aliiroseovarius crassostreae strain=DSM 16950 GCA_900116725.1 154981 154981 type True 78.7518 505 1106 95 below_threshold Aliiroseovarius pelagivivens strain=CECT 8811 GCA_900302485.1 1639690 1639690 type True 78.2443 363 1106 95 below_threshold Aliiroseovarius marinus strain=A6024 GCA_004360145.1 2500159 2500159 type True 78.1083 368 1106 95 below_threshold Aliiroseovarius halocynthiae strain=MA1-10 GCA_007004645.1 985055 985055 type True 77.9595 361 1106 95 below_threshold Aliiroseovarius sediminilitoris strain=DSM 29439 GCA_900109955.1 1173584 1173584 type True 77.7908 351 1106 95 below_threshold Aliiroseovarius zhejiangensis strain=KCTC 42443 GCA_014656375.1 1632025 1632025 type True 77.7052 373 1106 95 below_threshold Phaeobacter gallaeciensis strain=DSM 26640 GCA_000511385.1 60890 60890 type True 77.3078 187 1106 95 below_threshold Pseudophaeobacter arcticus strain=DSM 23566 GCA_000473205.1 385492 385492 type True 77.2978 205 1106 95 below_threshold Phaeobacter gallaeciensis strain=DSM 26640 GCA_000819625.1 60890 60890 type True 77.2006 188 1106 95 below_threshold Shimia marina strain=CECT 7688 GCA_001458175.1 321267 321267 type True 77.0272 189 1106 95 below_threshold Shimia marina strain=DSM 26895 GCA_900112745.1 321267 321267 type True 77.0066 189 1106 95 below_threshold Roseovarius gaetbuli strain=CECT 8370 GCA_900172365.1 1356575 1356575 type True 76.774 194 1106 95 below_threshold Roseovarius albus strain=CECT 7450 GCA_900172335.1 1247867 1247867 type True 76.7447 141 1106 95 below_threshold Celeribacter neptunius strain=DSM 26471 GCA_900113955.1 588602 588602 type True 76.7241 148 1106 95 below_threshold Alexandriicola marinus strain=LZ-14 GCA_004000435.1 2081710 2081710 type True 76.6377 86 1106 95 below_threshold Roseibacterium elongatum strain=DFL-43 GCA_000590925.1 159346 159346 type True 76.5683 107 1106 95 below_threshold Ruegeria halocynthiae strain=DSM 27839 GCA_900106805.1 985054 985054 type True 76.5634 157 1106 95 below_threshold Salipiger pallidus strain=CGMCC 1.15762 GCA_014643635.1 1775170 1775170 type True 76.4273 103 1106 95 below_threshold Salipiger marinus strain=DSM 26424 GCA_900100085.1 555512 555512 type True 76.1468 147 1106 95 below_threshold Rhodobacter amnigenus strain=HSP-20 GCA_009908265.2 2852097 2852097 type True 76.0413 104 1106 95 below_threshold -------------------------------------------------------------------------------- [2024-01-24 13:32:05,258] [INFO] DFAST Taxonomy check result was written to GCF_018599245.1_ASM1859924v1_genomic.fna/tc_result.tsv [2024-01-24 13:32:05,259] [INFO] ===== Taxonomy check completed ===== [2024-01-24 13:32:05,259] [INFO] ===== Start completeness check using CheckM ===== [2024-01-24 13:32:05,259] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg2b6dff62-342c-477a-905c-246842e9b38d/dqc_reference/checkm_data [2024-01-24 13:32:05,260] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2024-01-24 13:32:05,295] [INFO] Task started: CheckM [2024-01-24 13:32:05,295] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCF_018599245.1_ASM1859924v1_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCF_018599245.1_ASM1859924v1_genomic.fna/checkm_input GCF_018599245.1_ASM1859924v1_genomic.fna/checkm_result [2024-01-24 13:32:44,375] [INFO] Task succeeded: CheckM [2024-01-24 13:32:44,376] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 100.00% Contamintation: 0.00% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2024-01-24 13:32:44,392] [INFO] ===== Completeness check finished ===== [2024-01-24 13:32:44,392] [INFO] ===== Start GTDB Search ===== [2024-01-24 13:32:44,393] [INFO] Query marker FASTA already exists. Will reuse it. (GCF_018599245.1_ASM1859924v1_genomic.fna/markers.fasta) [2024-01-24 13:32:44,393] [INFO] Task started: Blastn [2024-01-24 13:32:44,393] [INFO] Running command: blastn -query GCF_018599245.1_ASM1859924v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg2b6dff62-342c-477a-905c-246842e9b38d/dqc_reference/reference_markers_gtdb.fasta -out GCF_018599245.1_ASM1859924v1_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-24 13:32:45,451] [INFO] Task succeeded: Blastn [2024-01-24 13:32:45,454] [INFO] Selected 13 target genomes. [2024-01-24 13:32:45,454] [INFO] Target genome list was writen to GCF_018599245.1_ASM1859924v1_genomic.fna/target_genomes_gtdb.txt [2024-01-24 13:32:45,466] [INFO] Task started: fastANI [2024-01-24 13:32:45,466] [INFO] Running command: fastANI --query /var/lib/cwl/stgd1454351-ac54-49b6-8530-52ceb83cd8ed/GCF_018599245.1_ASM1859924v1_genomic.fna.gz --refList GCF_018599245.1_ASM1859924v1_genomic.fna/target_genomes_gtdb.txt --output GCF_018599245.1_ASM1859924v1_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2024-01-24 13:32:54,597] [INFO] Task succeeded: fastANI [2024-01-24 13:32:54,615] [INFO] Found 13 fastANI hits (1 hits with ANI > circumscription radius) [2024-01-24 13:32:54,616] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCF_018599245.1 s__Aliiroseovarius lamellibrachiae 100.0 1106 1106 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Aliiroseovarius 95.0 N/A N/A N/A N/A 1 conclusive GCA_002340515.1 s__Aliiroseovarius sp002340515 89.5632 816 1106 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Aliiroseovarius 95.0 N/A N/A N/A N/A 1 - GCF_001307765.1 s__Aliiroseovarius crassostreae 78.7947 503 1106 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Aliiroseovarius 95.0 96.06 95.64 0.88 0.84 18 - GCF_900302485.1 s__Aliiroseovarius pelagivivens 78.2343 364 1106 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Aliiroseovarius 95.0 N/A N/A N/A N/A 1 - GCF_010500215.1 s__Aliiroseovarius sp010500215 78.1554 374 1106 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Aliiroseovarius 95.0 N/A N/A N/A N/A 1 - GCF_004360145.1 s__Aliiroseovarius marinus 78.1177 367 1106 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Aliiroseovarius 95.0 N/A N/A N/A N/A 1 - GCF_007004645.1 s__Aliiroseovarius halocynthiae 77.9691 360 1106 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Aliiroseovarius 95.0 N/A N/A N/A N/A 1 - GCF_014656375.1 s__Aliiroseovarius zhejiangensis 77.7052 373 1106 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Aliiroseovarius 95.0 N/A N/A N/A N/A 1 - GCF_000511385.1 s__Phaeobacter gallaeciensis 77.338 185 1106 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Phaeobacter 95.0 99.42 98.01 0.98 0.96 8 - GCF_000473205.1 s__Pseudophaeobacter arcticus 77.2978 205 1106 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Pseudophaeobacter 95.0 N/A N/A N/A N/A 1 - GCF_001458175.1 s__Shimia marina 77.0272 189 1106 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Shimia 95.0 98.13 96.28 0.94 0.89 3 - GCF_011806455.1 s__Celeribacter sp011806455 76.8555 153 1106 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Celeribacter 95.0 97.95 97.95 0.96 0.96 2 - GCA_002708925.1 s__Thalassobius sp002708925 76.7018 171 1106 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Thalassobius 95.0 99.91 99.91 0.84 0.84 2 - -------------------------------------------------------------------------------- [2024-01-24 13:32:54,618] [INFO] GTDB search result was written to GCF_018599245.1_ASM1859924v1_genomic.fna/result_gtdb.tsv [2024-01-24 13:32:54,619] [INFO] ===== GTDB Search completed ===== [2024-01-24 13:32:54,631] [INFO] DFAST_QC result json was written to GCF_018599245.1_ASM1859924v1_genomic.fna/dqc_result.json [2024-01-24 13:32:54,631] [INFO] DFAST_QC completed! [2024-01-24 13:32:54,631] [INFO] Total running time: 0h1m19s