[2023-06-27 08:37:16,374] [INFO] DFAST_QC pipeline started. [2023-06-27 08:37:16,376] [INFO] DFAST_QC version: 0.5.7 [2023-06-27 08:37:16,377] [INFO] DQC Reference Directory: /var/lib/cwl/stg2d804eae-d5da-452c-b39b-4134d3c7f55b/dqc_reference [2023-06-27 08:37:17,594] [INFO] ===== Start taxonomy check using ANI ===== [2023-06-27 08:37:17,595] [INFO] Task started: Prodigal [2023-06-27 08:37:17,596] [INFO] Running command: gunzip -c /var/lib/cwl/stg788958d3-bace-439c-ab3d-b697c81288b5/GCA_002255495.1_ASM225549v1_genomic.fna.gz | prodigal -d GCA_002255495.1_ASM225549v1_genomic.fna/cds.fna -a GCA_002255495.1_ASM225549v1_genomic.fna/protein.faa -g 11 -q > /dev/null [2023-06-27 08:37:26,410] [INFO] Task succeeded: Prodigal [2023-06-27 08:37:26,411] [INFO] Task started: HMMsearch [2023-06-27 08:37:26,411] [INFO] Running command: hmmsearch --tblout GCA_002255495.1_ASM225549v1_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg2d804eae-d5da-452c-b39b-4134d3c7f55b/dqc_reference/reference_markers.hmm GCA_002255495.1_ASM225549v1_genomic.fna/protein.faa > /dev/null [2023-06-27 08:37:26,712] [INFO] Task succeeded: HMMsearch [2023-06-27 08:37:26,713] [WARNING] Found 5/6 markers. [/var/lib/cwl/stg788958d3-bace-439c-ab3d-b697c81288b5/GCA_002255495.1_ASM225549v1_genomic.fna.gz] [2023-06-27 08:37:26,742] [INFO] Query marker FASTA was written to GCA_002255495.1_ASM225549v1_genomic.fna/markers.fasta [2023-06-27 08:37:26,743] [INFO] Task started: Blastn [2023-06-27 08:37:26,743] [INFO] Running command: blastn -query GCA_002255495.1_ASM225549v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg2d804eae-d5da-452c-b39b-4134d3c7f55b/dqc_reference/reference_markers.fasta -out GCA_002255495.1_ASM225549v1_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-27 08:37:27,403] [INFO] Task succeeded: Blastn [2023-06-27 08:37:27,406] [INFO] Selected 25 target genomes. [2023-06-27 08:37:27,407] [INFO] Target genome list was writen to GCA_002255495.1_ASM225549v1_genomic.fna/target_genomes.txt [2023-06-27 08:37:27,417] [INFO] Task started: fastANI [2023-06-27 08:37:27,417] [INFO] Running command: fastANI --query /var/lib/cwl/stg788958d3-bace-439c-ab3d-b697c81288b5/GCA_002255495.1_ASM225549v1_genomic.fna.gz --refList GCA_002255495.1_ASM225549v1_genomic.fna/target_genomes.txt --output GCA_002255495.1_ASM225549v1_genomic.fna/fastani_result.tsv --threads 1 [2023-06-27 08:37:43,469] [INFO] Task succeeded: fastANI [2023-06-27 08:37:43,469] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg2d804eae-d5da-452c-b39b-4134d3c7f55b/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2023-06-27 08:37:43,470] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg2d804eae-d5da-452c-b39b-4134d3c7f55b/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2023-06-27 08:37:43,483] [INFO] Found 16 fastANI hits (0 hits with ANI > threshold) [2023-06-27 08:37:43,483] [INFO] The taxonomy check result is classified as 'below_threshold'. [2023-06-27 08:37:43,484] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Acidiphilium multivorum strain=AIU301 GCA_000202835.1 62140 62140 type True 76.8992 186 914 95 below_threshold Acidiphilium multivorum strain=AIU301 GCA_000964345.1 62140 62140 type True 76.8286 177 914 95 below_threshold Acidiphilium iwatense strain=KCTC 23505 GCA_021556475.1 768198 768198 type True 76.7291 224 914 95 below_threshold Rhodovastum atsumiense strain=G2-11 GCA_937425535.1 504468 504468 type True 76.3173 135 914 95 below_threshold Acidisoma cellulosilyticum strain=HW T5.17 GCA_020621385.1 2802395 2802395 type True 76.1085 120 914 95 below_threshold Roseomonas oryzae strain=KCTC 42542 GCA_008386565.1 1608942 1608942 type True 75.9435 86 914 95 below_threshold Roseomonas cervicalis strain=ATCC 49957 GCA_000164635.1 204525 204525 type True 75.7107 132 914 95 below_threshold Rhodovarius lipocyclicus strain=CCUG 44693 GCA_009900765.1 268410 268410 type True 75.6394 84 914 95 below_threshold Roseomonas deserti strain=M3 GCA_001982615.1 1817963 1817963 type True 75.6241 105 914 95 below_threshold Roseococcus microcysteis strain=NIBR12 GCA_014764365.1 2771361 2771361 type True 75.6021 89 914 95 below_threshold Belnapia rosea strain=CGMCC 1.10758 GCA_900104205.1 938405 938405 type True 75.534 106 914 95 below_threshold Belnapia rosea strain=CPCC 100156 GCA_900101615.1 938405 938405 type True 75.519 111 914 95 below_threshold Rhodovarius crocodyli strain=CCP-6 GCA_004005855.1 1979269 1979269 type True 75.3962 83 914 95 below_threshold Roseomonas coralli strain=M0104 GCA_009829925.1 2545983 2545983 type True 75.385 100 914 95 below_threshold Roseomonas rubea strain=MO17 GCA_016106015.1 2748666 2748666 type True 75.3629 87 914 95 below_threshold Caulobacter endophyticus strain=774 GCA_003116815.1 2172652 2172652 type True 75.1133 50 914 95 below_threshold -------------------------------------------------------------------------------- [2023-06-27 08:37:43,486] [INFO] DFAST Taxonomy check result was written to GCA_002255495.1_ASM225549v1_genomic.fna/tc_result.tsv [2023-06-27 08:37:43,486] [INFO] ===== Taxonomy check completed ===== [2023-06-27 08:37:43,486] [INFO] ===== Start completeness check using CheckM ===== [2023-06-27 08:37:43,487] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg2d804eae-d5da-452c-b39b-4134d3c7f55b/dqc_reference/checkm_data [2023-06-27 08:37:43,488] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2023-06-27 08:37:43,519] [INFO] Task started: CheckM [2023-06-27 08:37:43,519] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCA_002255495.1_ASM225549v1_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCA_002255495.1_ASM225549v1_genomic.fna/checkm_input GCA_002255495.1_ASM225549v1_genomic.fna/checkm_result [2023-06-27 08:38:15,272] [INFO] Task succeeded: CheckM [2023-06-27 08:38:15,274] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 98.61% Contamintation: 0.52% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2023-06-27 08:38:15,319] [INFO] ===== Completeness check finished ===== [2023-06-27 08:38:15,320] [INFO] ===== Start GTDB Search ===== [2023-06-27 08:38:15,320] [INFO] Query marker FASTA already exists. Will reuse it. (GCA_002255495.1_ASM225549v1_genomic.fna/markers.fasta) [2023-06-27 08:38:15,321] [INFO] Task started: Blastn [2023-06-27 08:38:15,321] [INFO] Running command: blastn -query GCA_002255495.1_ASM225549v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg2d804eae-d5da-452c-b39b-4134d3c7f55b/dqc_reference/reference_markers_gtdb.fasta -out GCA_002255495.1_ASM225549v1_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-27 08:38:16,324] [INFO] Task succeeded: Blastn [2023-06-27 08:38:16,328] [INFO] Selected 21 target genomes. [2023-06-27 08:38:16,328] [INFO] Target genome list was writen to GCA_002255495.1_ASM225549v1_genomic.fna/target_genomes_gtdb.txt [2023-06-27 08:38:16,336] [INFO] Task started: fastANI [2023-06-27 08:38:16,336] [INFO] Running command: fastANI --query /var/lib/cwl/stg788958d3-bace-439c-ab3d-b697c81288b5/GCA_002255495.1_ASM225549v1_genomic.fna.gz --refList GCA_002255495.1_ASM225549v1_genomic.fna/target_genomes_gtdb.txt --output GCA_002255495.1_ASM225549v1_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2023-06-27 08:38:31,830] [INFO] Task succeeded: fastANI [2023-06-27 08:38:31,853] [INFO] Found 19 fastANI hits (1 hits with ANI > circumscription radius) [2023-06-27 08:38:31,853] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCA_002255495.1 s__20-60-12 sp002255495 100.0 901 914 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__20-60-12 95.0 N/A N/A N/A N/A 1 conclusive GCF_900156265.1 s__Acidiphilium rubrum 77.092 207 914 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__Acidiphilium 95.0 97.80 96.68 0.89 0.85 4 - GCF_000202835.1 s__Acidiphilium multivorum 76.886 187 914 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__Acidiphilium 95.0 98.69 98.42 0.90 0.87 6 - GCF_008630155.1 s__Rhodovastum atsumiense 76.3019 135 914 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__Rhodovastum 95.0 N/A N/A N/A N/A 1 - GCA_002255575.1 s__Acidocella sp002255575 76.2882 145 914 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__Acidocella 95.0 99.72 99.72 0.92 0.92 2 - GCA_011332665.1 s__Acidibrevibacterium sp011332665 76.111 111 914 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__Acidibrevibacterium 95.0 N/A N/A N/A N/A 1 - GCA_019241425.1 s__JAFAXA01 sp019241425 76.0693 80 914 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__JAFAXA01 95.0 N/A N/A N/A N/A 1 - GCF_008386565.1 s__Roseomonas oryzae 75.9256 87 914 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__Roseomonas 95.0 N/A N/A N/A N/A 1 - GCF_009765975.1 s__Rhodopila sp009765975 75.908 102 914 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__Rhodopila 95.0 N/A N/A N/A N/A 1 - GCA_903851415.1 s__Rhodopila sp903851415 75.9027 102 914 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__Rhodopila 95.0 99.89 99.89 0.92 0.92 2 - GCA_003133265.1 s__PALSA-911 sp003133265 75.7329 102 914 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__PALSA-911 95.0 N/A N/A N/A N/A 1 - GCF_000164635.1 s__Roseomonas cervicalis 75.7007 133 914 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__Roseomonas 95.0 N/A N/A N/A N/A 1 - GCA_903845065.1 s__CAIYPH01 sp903845065 75.6545 119 914 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__CAIYPH01 95.0 N/A N/A N/A N/A 1 - GCF_900101615.1 s__Belnapia rosea 75.5283 110 914 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__Belnapia 95.0 98.36 98.36 0.89 0.89 2 - GCA_014376425.1 s__Rhodovarius sp014376425 75.2555 55 914 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__Rhodovarius 95.0 N/A N/A N/A N/A 1 - GCA_903896535.1 s__CAITYE01 sp903896535 75.2511 81 914 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__CAITYE01 95.0 99.30 99.07 0.87 0.85 4 - GCA_016792765.1 s__SBBV01 sp016792765 75.1925 56 914 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Bin65;f__Bin65;g__SBBV01 95.0 N/A N/A N/A N/A 1 - GCA_003164305.1 s__BOG-934 sp003164305 75.0822 50 914 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Acetobacterales;f__Acetobacteraceae;g__BOG-934 95.0 N/A N/A N/A N/A 1 - GCA_002238725.1 s__Bin65 sp002238725 74.9291 50 914 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Bin65;f__Bin65;g__Bin65 95.0 N/A N/A N/A N/A 1 - -------------------------------------------------------------------------------- [2023-06-27 08:38:31,873] [INFO] GTDB search result was written to GCA_002255495.1_ASM225549v1_genomic.fna/result_gtdb.tsv [2023-06-27 08:38:31,875] [INFO] ===== GTDB Search completed ===== [2023-06-27 08:38:31,881] [INFO] DFAST_QC result json was written to GCA_002255495.1_ASM225549v1_genomic.fna/dqc_result.json [2023-06-27 08:38:31,882] [INFO] DFAST_QC completed! [2023-06-27 08:38:31,882] [INFO] Total running time: 0h1m16s