[2023-06-18 13:19:22,925] [INFO] DFAST_QC pipeline started. [2023-06-18 13:19:22,932] [INFO] DFAST_QC version: 0.5.7 [2023-06-18 13:19:22,933] [INFO] DQC Reference Directory: /var/lib/cwl/stg2580c0d3-be68-43fc-a426-8f2a18f70a3f/dqc_reference [2023-06-18 13:19:24,100] [INFO] ===== Start taxonomy check using ANI ===== [2023-06-18 13:19:24,101] [INFO] Task started: Prodigal [2023-06-18 13:19:24,101] [INFO] Running command: gunzip -c /var/lib/cwl/stga1ba7f8b-9cbd-4fe7-80cb-45239d3d35d0/GCA_018971285.1_ASM1897128v1_genomic.fna.gz | prodigal -d GCA_018971285.1_ASM1897128v1_genomic.fna/cds.fna -a GCA_018971285.1_ASM1897128v1_genomic.fna/protein.faa -g 11 -q > /dev/null [2023-06-18 13:19:28,739] [INFO] Task succeeded: Prodigal [2023-06-18 13:19:28,739] [INFO] Task started: HMMsearch [2023-06-18 13:19:28,739] [INFO] Running command: hmmsearch --tblout GCA_018971285.1_ASM1897128v1_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg2580c0d3-be68-43fc-a426-8f2a18f70a3f/dqc_reference/reference_markers.hmm GCA_018971285.1_ASM1897128v1_genomic.fna/protein.faa > /dev/null [2023-06-18 13:19:28,924] [INFO] Task succeeded: HMMsearch [2023-06-18 13:19:28,926] [INFO] Found 6/6 markers. [2023-06-18 13:19:28,951] [INFO] Query marker FASTA was written to GCA_018971285.1_ASM1897128v1_genomic.fna/markers.fasta [2023-06-18 13:19:28,951] [INFO] Task started: Blastn [2023-06-18 13:19:28,951] [INFO] Running command: blastn -query GCA_018971285.1_ASM1897128v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg2580c0d3-be68-43fc-a426-8f2a18f70a3f/dqc_reference/reference_markers.fasta -out GCA_018971285.1_ASM1897128v1_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-18 13:19:29,740] [INFO] Task succeeded: Blastn [2023-06-18 13:19:29,746] [INFO] Selected 35 target genomes. [2023-06-18 13:19:29,746] [INFO] Target genome list was writen to GCA_018971285.1_ASM1897128v1_genomic.fna/target_genomes.txt [2023-06-18 13:19:29,752] [INFO] Task started: fastANI [2023-06-18 13:19:29,753] [INFO] Running command: fastANI --query /var/lib/cwl/stga1ba7f8b-9cbd-4fe7-80cb-45239d3d35d0/GCA_018971285.1_ASM1897128v1_genomic.fna.gz --refList GCA_018971285.1_ASM1897128v1_genomic.fna/target_genomes.txt --output GCA_018971285.1_ASM1897128v1_genomic.fna/fastani_result.tsv --threads 1 [2023-06-18 13:19:51,831] [INFO] Task succeeded: fastANI [2023-06-18 13:19:51,831] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg2580c0d3-be68-43fc-a426-8f2a18f70a3f/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2023-06-18 13:19:51,832] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg2580c0d3-be68-43fc-a426-8f2a18f70a3f/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2023-06-18 13:19:51,852] [INFO] Found 27 fastANI hits (0 hits with ANI > threshold) [2023-06-18 13:19:51,852] [INFO] The taxonomy check result is classified as 'below_threshold'. [2023-06-18 13:19:51,853] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Azoarcus rhizosphaerae strain=CC-YHH848 GCA_004801305.1 2565932 2565932 type True 77.2908 83 411 95 below_threshold Sulfurimicrobium lacus strain=skT11 GCA_011764585.1 2715678 2715678 type True 77.0292 66 411 95 below_threshold Azoarcus nasutitermitis strain=CC-YHH838 GCA_004801295.1 2565930 2565930 type True 76.8188 87 411 95 below_threshold Cupriavidus taiwanensis strain=LMG 19424 GCA_000069785.1 164546 164546 suspected-type True 76.789 78 411 95 below_threshold Azospira oryzae strain=DSM 21223 GCA_004217225.1 146939 146939 type True 76.6459 82 411 95 below_threshold Chromobacterium alticapitis strain=MWU14-2602 GCA_002924365.1 2073169 2073169 type True 76.6442 66 411 95 below_threshold Cupriavidus neocaledonicus strain=STM6070 GCA_000372525.1 1040979 1040979 type True 76.6264 82 411 95 below_threshold Chromobacterium aquaticum strain=DSM 19852 GCA_021129195.1 467180 467180 type True 76.6261 63 411 95 below_threshold Chromobacterium violaceum strain=NCTC9757 GCA_900446805.1 536 536 type True 76.593 74 411 95 below_threshold Cupriavidus alkaliphilus strain=ASC-732 GCA_900094595.1 942866 942866 type True 76.593 86 411 95 below_threshold Cupriavidus nantongensis strain=X1 GCA_001598055.1 1796606 1796606 type True 76.5872 82 411 95 below_threshold Chromobacterium sphagni strain=IIBBL 14B-1 GCA_001855555.1 1903179 1903179 type True 76.5832 68 411 95 below_threshold Chromobacterium violaceum strain=ATCC 12472 GCA_000007705.1 536 536 type True 76.5667 75 411 95 below_threshold Chromobacterium vaccinii strain=MWU205 GCA_000971335.1 1108595 1108595 type True 76.5597 61 411 95 below_threshold Zoogloea ramigera strain=NBRC 15342 GCA_006539865.1 350 350 type True 76.4347 73 411 95 below_threshold Chromobacterium sinusclupearum strain=MWU13-2610 GCA_002902845.1 2077146 2077146 type True 76.4274 68 411 95 below_threshold Ideonella benzenivorans strain=B7 GCA_020387415.1 2831643 2831643 type True 76.4054 74 411 95 below_threshold Chromobacterium piscinae strain=DSM 23278 GCA_021129175.1 686831 686831 type True 76.3877 59 411 95 below_threshold Cupriavidus cauae strain=MKL-01 GCA_008632125.1 2608999 2608999 type True 76.3401 62 411 95 below_threshold Denitratisoma oestradiolicum strain=DSM 16959 GCA_007844305.1 311182 311182 type True 76.2374 58 411 95 below_threshold Denitratisoma oestradiolicum strain=DSM 16959 GCA_902813185.1 311182 311182 type True 76.2374 58 411 95 below_threshold Rugamonas brunnea strain=LX20W GCA_014042345.1 2758569 2758569 type True 76.2114 87 411 95 below_threshold Chromobacterium phragmitis strain=IIBBL 112-1 GCA_003325475.1 2202141 2202141 type True 76.2081 63 411 95 below_threshold Vogesella indigofera strain=DSM 3303 GCA_003633895.1 45465 45465 type True 76.1691 60 411 95 below_threshold Massilia norwichensis strain=LMG 28164 GCA_024753245.1 1442366 1442366 type True 76.0872 64 411 95 below_threshold Pseudoduganella buxea strain=CGMCC 1.15931 GCA_014644155.1 1949069 1949069 type True 75.8127 64 411 95 below_threshold Pseudoduganella buxea strain=KCTC 52429 GCA_009720835.1 1949069 1949069 type True 75.7215 62 411 95 below_threshold -------------------------------------------------------------------------------- [2023-06-18 13:19:51,856] [INFO] DFAST Taxonomy check result was written to GCA_018971285.1_ASM1897128v1_genomic.fna/tc_result.tsv [2023-06-18 13:19:51,856] [INFO] ===== Taxonomy check completed ===== [2023-06-18 13:19:51,856] [INFO] ===== Start completeness check using CheckM ===== [2023-06-18 13:19:51,857] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg2580c0d3-be68-43fc-a426-8f2a18f70a3f/dqc_reference/checkm_data [2023-06-18 13:19:51,858] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2023-06-18 13:19:51,879] [INFO] Task started: CheckM [2023-06-18 13:19:51,879] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCA_018971285.1_ASM1897128v1_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCA_018971285.1_ASM1897128v1_genomic.fna/checkm_input GCA_018971285.1_ASM1897128v1_genomic.fna/checkm_result [2023-06-18 13:20:11,708] [INFO] Task succeeded: CheckM [2023-06-18 13:20:11,709] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 85.65% Contamintation: 6.94% Strain heterogeneity: 100.00% -------------------------------------------------------------------------------- [2023-06-18 13:20:11,738] [INFO] ===== Completeness check finished ===== [2023-06-18 13:20:11,738] [INFO] ===== Start GTDB Search ===== [2023-06-18 13:20:11,738] [INFO] Query marker FASTA already exists. Will reuse it. (GCA_018971285.1_ASM1897128v1_genomic.fna/markers.fasta) [2023-06-18 13:20:11,739] [INFO] Task started: Blastn [2023-06-18 13:20:11,739] [INFO] Running command: blastn -query GCA_018971285.1_ASM1897128v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg2580c0d3-be68-43fc-a426-8f2a18f70a3f/dqc_reference/reference_markers_gtdb.fasta -out GCA_018971285.1_ASM1897128v1_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-18 13:20:13,121] [INFO] Task succeeded: Blastn [2023-06-18 13:20:13,135] [INFO] Selected 28 target genomes. [2023-06-18 13:20:13,135] [INFO] Target genome list was writen to GCA_018971285.1_ASM1897128v1_genomic.fna/target_genomes_gtdb.txt [2023-06-18 13:20:13,172] [INFO] Task started: fastANI [2023-06-18 13:20:13,172] [INFO] Running command: fastANI --query /var/lib/cwl/stga1ba7f8b-9cbd-4fe7-80cb-45239d3d35d0/GCA_018971285.1_ASM1897128v1_genomic.fna.gz --refList GCA_018971285.1_ASM1897128v1_genomic.fna/target_genomes_gtdb.txt --output GCA_018971285.1_ASM1897128v1_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2023-06-18 13:20:32,308] [INFO] Task succeeded: fastANI [2023-06-18 13:20:32,325] [INFO] Found 21 fastANI hits (0 hits with ANI > circumscription radius) [2023-06-18 13:20:32,326] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCF_004801305.1 s__Thauera_A rhizosphaerae 77.2756 82 411 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Rhodocyclaceae;g__Thauera_A 95.0 N/A N/A N/A N/A 1 - GCA_903863375.1 s__CAIVHB01 sp903863375 77.2122 80 411 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Ferrovaceae;g__CAIVHB01 95.0 99.77 99.77 0.92 0.92 2 - GCA_016183615.1 s__JACPEG01 sp016183615 76.9379 74 411 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__UBA6910;g__JACPEG01 95.0 N/A N/A N/A N/A 1 - GCA_903893215.1 s__CAIVHB01 sp903893215 76.6772 76 411 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Ferrovaceae;g__CAIVHB01 95.0 99.91 99.87 0.93 0.92 3 - GCF_002924365.1 s__Chromobacterium sp002924365 76.6442 66 411 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Chromobacteriaceae;g__Chromobacterium 95.0 N/A N/A N/A N/A 1 - GCA_018336195.1 s__JAGXQV01 sp018336195 76.6003 56 411 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Sulfuricellaceae;g__JAGXQV01 95.0 N/A N/A N/A N/A 1 - GCF_900094595.1 s__Cupriavidus alkaliphilus 76.5961 86 411 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Burkholderiaceae;g__Cupriavidus 95.0 96.96 96.04 0.91 0.89 16 - GCF_001598055.1 s__Cupriavidus nantongensis 76.5872 82 411 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Burkholderiaceae;g__Cupriavidus 95.0 95.73 95.73 0.81 0.78 3 - GCF_900249755.1 s__Cupriavidus taiwanensis_D 76.4956 78 411 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Burkholderiaceae;g__Cupriavidus 95.0 98.80 98.80 0.92 0.92 2 - GCF_007833355.1 s__Denitratisoma sp007833355 76.4733 81 411 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Rhodocyclaceae;g__Denitratisoma 95.0 N/A N/A N/A N/A 1 - GCF_000876015.1 s__Cupriavidus basilensis_B 76.4533 98 411 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Burkholderiaceae;g__Cupriavidus 95.0 98.55 98.55 0.88 0.88 2 - GCA_016219425.1 s__PFJX01 sp016219425 76.4263 63 411 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Thiobacillaceae;g__PFJX01 95.0 N/A N/A N/A N/A 1 - GCF_018119395.1 s__Massilia sp018119395 76.2617 64 411 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Burkholderiaceae;g__Massilia 95.0 99.25 99.25 0.95 0.95 2 - GCF_000620105.1 s__Microvirgula aerodenitrificans 76.2216 57 411 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Aquaspirillaceae;g__Microvirgula 95.0 99.13 99.12 0.93 0.93 3 - GCF_003325475.1 s__Chromobacterium phragmitis 76.2081 63 411 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Chromobacteriaceae;g__Chromobacterium 95.0 99.00 99.00 0.92 0.92 2 - GCF_000423285.1 s__Laribacter hongkongensis 76.1774 62 411 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Aquaspirillaceae;g__Laribacter 95.0 98.14 97.82 0.92 0.90 5 - GCF_000711885.1 s__Chromobacterium haemolyticum 76.1337 63 411 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Chromobacteriaceae;g__Chromobacterium 95.0 96.93 96.05 0.88 0.85 14 - GCA_903909415.1 s__CAIVVS01 sp903909415 76.1019 62 411 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__SG8-39;g__CAIVVS01 95.0 N/A N/A N/A N/A 1 - GCF_003202035.1 s__Aquitalea magnusonii 76.0847 67 411 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Chromobacteriaceae;g__Aquitalea 95.0 99.21 98.53 0.95 0.91 3 - GCF_003416985.1 s__Duganella sp003416985 76.0118 70 411 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Burkholderiaceae;g__Duganella 95.0 97.63 95.41 0.92 0.79 7 - GCF_003416895.1 s__Duganella sp003416895 76.0101 63 411 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Burkholderiaceae;g__Duganella 95.0 99.92 99.92 0.99 0.99 2 - -------------------------------------------------------------------------------- [2023-06-18 13:20:32,328] [INFO] GTDB search result was written to GCA_018971285.1_ASM1897128v1_genomic.fna/result_gtdb.tsv [2023-06-18 13:20:32,330] [INFO] ===== GTDB Search completed ===== [2023-06-18 13:20:32,336] [INFO] DFAST_QC result json was written to GCA_018971285.1_ASM1897128v1_genomic.fna/dqc_result.json [2023-06-18 13:20:32,336] [INFO] DFAST_QC completed! [2023-06-18 13:20:32,336] [INFO] Total running time: 0h1m9s