[2023-06-16 20:50:18,564] [INFO] DFAST_QC pipeline started. [2023-06-16 20:50:18,569] [INFO] DFAST_QC version: 0.5.7 [2023-06-16 20:50:18,569] [INFO] DQC Reference Directory: /var/lib/cwl/stg2e83da46-30bd-4407-9deb-fd92194f35e4/dqc_reference [2023-06-16 20:50:19,769] [INFO] ===== Start taxonomy check using ANI ===== [2023-06-16 20:50:19,769] [INFO] Task started: Prodigal [2023-06-16 20:50:19,770] [INFO] Running command: gunzip -c /var/lib/cwl/stg76a6bf3f-3ffc-4b33-812b-8d5388be1374/GCA_009928075.1_ASM992807v1_genomic.fna.gz | prodigal -d GCA_009928075.1_ASM992807v1_genomic.fna/cds.fna -a GCA_009928075.1_ASM992807v1_genomic.fna/protein.faa -g 11 -q > /dev/null [2023-06-16 20:50:24,772] [INFO] Task succeeded: Prodigal [2023-06-16 20:50:24,772] [INFO] Task started: HMMsearch [2023-06-16 20:50:24,772] [INFO] Running command: hmmsearch --tblout GCA_009928075.1_ASM992807v1_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg2e83da46-30bd-4407-9deb-fd92194f35e4/dqc_reference/reference_markers.hmm GCA_009928075.1_ASM992807v1_genomic.fna/protein.faa > /dev/null [2023-06-16 20:50:24,941] [INFO] Task succeeded: HMMsearch [2023-06-16 20:50:24,942] [WARNING] Found 5/6 markers. [/var/lib/cwl/stg76a6bf3f-3ffc-4b33-812b-8d5388be1374/GCA_009928075.1_ASM992807v1_genomic.fna.gz] [2023-06-16 20:50:24,964] [INFO] Query marker FASTA was written to GCA_009928075.1_ASM992807v1_genomic.fna/markers.fasta [2023-06-16 20:50:24,964] [INFO] Task started: Blastn [2023-06-16 20:50:24,964] [INFO] Running command: blastn -query GCA_009928075.1_ASM992807v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg2e83da46-30bd-4407-9deb-fd92194f35e4/dqc_reference/reference_markers.fasta -out GCA_009928075.1_ASM992807v1_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-16 20:50:25,766] [INFO] Task succeeded: Blastn [2023-06-16 20:50:25,770] [INFO] Selected 23 target genomes. [2023-06-16 20:50:25,770] [INFO] Target genome list was writen to GCA_009928075.1_ASM992807v1_genomic.fna/target_genomes.txt [2023-06-16 20:50:25,791] [INFO] Task started: fastANI [2023-06-16 20:50:25,791] [INFO] Running command: fastANI --query /var/lib/cwl/stg76a6bf3f-3ffc-4b33-812b-8d5388be1374/GCA_009928075.1_ASM992807v1_genomic.fna.gz --refList GCA_009928075.1_ASM992807v1_genomic.fna/target_genomes.txt --output GCA_009928075.1_ASM992807v1_genomic.fna/fastani_result.tsv --threads 1 [2023-06-16 20:50:39,885] [INFO] Task succeeded: fastANI [2023-06-16 20:50:39,885] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg2e83da46-30bd-4407-9deb-fd92194f35e4/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2023-06-16 20:50:39,885] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg2e83da46-30bd-4407-9deb-fd92194f35e4/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2023-06-16 20:50:39,901] [INFO] Found 20 fastANI hits (0 hits with ANI > threshold) [2023-06-16 20:50:39,901] [INFO] The taxonomy check result is classified as 'below_threshold'. [2023-06-16 20:50:39,901] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Thauera phenylacetica strain=B4P GCA_000310225.1 164400 164400 type True 77.785 88 393 95 below_threshold Sulfurisoma sediminicola strain=DSM 26916 GCA_003663955.1 1381557 1381557 type True 77.6395 75 393 95 below_threshold Sulfurisoma sediminicola strain=BSN1 GCA_003865015.1 1381557 1381557 type True 77.6189 74 393 95 below_threshold Thauera aminoaromatica strain=S2 GCA_000310185.1 164330 164330 type True 77.371 81 393 95 below_threshold Rhodocyclus purpureus strain=DSM 168 GCA_016653115.1 1067 1067 type True 77.3212 85 393 95 below_threshold Aromatoleum tolulyticum strain=ATCC 51758 GCA_900156155.1 34027 34027 type True 77.2403 63 393 95 below_threshold Rhodocyclus tenuis strain=2761 GCA_014197755.1 1066 1066 type True 77.1315 80 393 95 below_threshold Rhodocyclus tenuis strain=2761 GCA_009469755.1 1066 1066 type True 77.0653 78 393 95 below_threshold Thauera chlorobenzoica strain=3CB-1 GCA_900108255.1 96773 96773 type True 77.0259 80 393 95 below_threshold Thauera chlorobenzoica strain=3CB1 GCA_001922305.1 96773 96773 type True 76.9939 81 393 95 below_threshold Azoarcus rhizosphaerae strain=CC-YHH848 GCA_004801305.1 2565932 2565932 type True 76.9533 79 393 95 below_threshold Chromobacterium sinusclupearum strain=MWU13-2610 GCA_002902845.1 2077146 2077146 type True 76.8174 62 393 95 below_threshold Chromobacterium amazonense strain=DSM 26508 GCA_001855565.1 1382803 1382803 type True 76.7612 62 393 95 below_threshold Chromobacterium aquaticum strain=DSM 19852 GCA_021129195.1 467180 467180 type True 76.7594 59 393 95 below_threshold Zoogloea ramigera strain=NBRC 15342 GCA_006539865.1 350 350 type True 76.7483 80 393 95 below_threshold Azoarcus nasutitermitis strain=CC-YHH838 GCA_004801295.1 2565930 2565930 type True 76.668 81 393 95 below_threshold Chromobacterium alkanivorans strain=IITR-71 GCA_016937655.1 1071719 1071719 type True 76.6427 69 393 95 below_threshold Aromatoleum toluvorans strain=Td21 GCA_012910905.1 92002 92002 type True 76.5814 80 393 95 below_threshold Vogesella alkaliphila strain=KCTC 32041 GCA_014652475.1 1193621 1193621 type True 76.4886 67 393 95 below_threshold Burkholderia perseverans strain=INN12 GCA_022870505.1 2615214 2615214 type True 76.2677 57 393 95 below_threshold -------------------------------------------------------------------------------- [2023-06-16 20:50:39,903] [INFO] DFAST Taxonomy check result was written to GCA_009928075.1_ASM992807v1_genomic.fna/tc_result.tsv [2023-06-16 20:50:39,904] [INFO] ===== Taxonomy check completed ===== [2023-06-16 20:50:39,904] [INFO] ===== Start completeness check using CheckM ===== [2023-06-16 20:50:39,904] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg2e83da46-30bd-4407-9deb-fd92194f35e4/dqc_reference/checkm_data [2023-06-16 20:50:39,905] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2023-06-16 20:50:39,934] [INFO] Task started: CheckM [2023-06-16 20:50:39,935] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCA_009928075.1_ASM992807v1_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCA_009928075.1_ASM992807v1_genomic.fna/checkm_input GCA_009928075.1_ASM992807v1_genomic.fna/checkm_result [2023-06-16 20:51:00,516] [INFO] Task succeeded: CheckM [2023-06-16 20:51:00,517] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 88.74% Contamintation: 0.52% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2023-06-16 20:51:00,535] [INFO] ===== Completeness check finished ===== [2023-06-16 20:51:00,536] [INFO] ===== Start GTDB Search ===== [2023-06-16 20:51:00,536] [INFO] Query marker FASTA already exists. Will reuse it. (GCA_009928075.1_ASM992807v1_genomic.fna/markers.fasta) [2023-06-16 20:51:00,536] [INFO] Task started: Blastn [2023-06-16 20:51:00,536] [INFO] Running command: blastn -query GCA_009928075.1_ASM992807v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg2e83da46-30bd-4407-9deb-fd92194f35e4/dqc_reference/reference_markers_gtdb.fasta -out GCA_009928075.1_ASM992807v1_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-16 20:51:01,925] [INFO] Task succeeded: Blastn [2023-06-16 20:51:01,929] [INFO] Selected 20 target genomes. [2023-06-16 20:51:01,929] [INFO] Target genome list was writen to GCA_009928075.1_ASM992807v1_genomic.fna/target_genomes_gtdb.txt [2023-06-16 20:51:01,951] [INFO] Task started: fastANI [2023-06-16 20:51:01,951] [INFO] Running command: fastANI --query /var/lib/cwl/stg76a6bf3f-3ffc-4b33-812b-8d5388be1374/GCA_009928075.1_ASM992807v1_genomic.fna.gz --refList GCA_009928075.1_ASM992807v1_genomic.fna/target_genomes_gtdb.txt --output GCA_009928075.1_ASM992807v1_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2023-06-16 20:51:11,192] [INFO] Task succeeded: fastANI [2023-06-16 20:51:11,205] [INFO] Found 14 fastANI hits (1 hits with ANI > circumscription radius) [2023-06-16 20:51:11,206] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCA_009927815.1 s__UBA3065 sp002367415 99.0598 298 393 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__UBA3065;g__UBA3065 95.0 99.03 98.97 0.84 0.74 6 conclusive GCA_018882975.1 s__UBA3065 sp018882975 81.3426 220 393 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__UBA3065;g__UBA3065 95.0 N/A N/A N/A N/A 1 - GCA_009923755.1 s__UBA3065 sp009923755 81.0553 203 393 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__UBA3065;g__UBA3065 95.0 N/A N/A N/A N/A 1 - GCF_016858125.1 s__Azospira_A restricta 77.9357 91 393 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Rhodocyclaceae;g__Azospira_A 95.0 N/A N/A N/A N/A 1 - GCF_000012745.1 s__Thiobacillus denitrificans_B 77.6293 70 393 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Thiobacillaceae;g__Thiobacillus 95.0 N/A N/A N/A N/A 1 - GCA_003446655.1 s__Thauera sp003446655 77.5185 54 393 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Rhodocyclaceae;g__Thauera 95.0 99.50 99.50 0.90 0.90 2 - GCA_016183655.1 s__Azospira_A sp016183655 77.3714 87 393 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Rhodocyclaceae;g__Azospira_A 95.0 N/A N/A N/A N/A 1 - GCF_016653115.1 s__Rhodocyclus purpureus 77.3212 85 393 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Rhodocyclaceae;g__Rhodocyclus 95.0 N/A N/A N/A N/A 1 - GCA_016720425.1 s__CAIWHR01 sp016720425 77.1771 80 393 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Casimicrobiaceae;g__CAIWHR01 95.0 98.60 97.38 0.93 0.90 10 - GCF_004337445.1 s__Parasulfuritortus cantonensis 76.9456 65 393 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Thiobacillaceae;g__Parasulfuritortus 95.0 N/A N/A N/A N/A 1 - GCF_009469595.1 s__Thauera sp009469595 76.8151 81 393 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Rhodocyclaceae;g__Thauera 95.0 N/A N/A N/A N/A 1 - GCF_003628555.1 s__Aromatoleum sp003628555 76.6282 79 393 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Rhodocyclaceae;g__Aromatoleum 95.0 N/A N/A N/A N/A 1 - GCA_017302295.1 s__Thauera sp017302295 76.5518 79 393 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Rhodocyclaceae;g__Thauera 95.0 N/A N/A N/A N/A 1 - GCA_018240785.1 s__Thauera sp018240785 76.5228 79 393 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Rhodocyclaceae;g__Thauera 95.0 N/A N/A N/A N/A 1 - -------------------------------------------------------------------------------- [2023-06-16 20:51:11,208] [INFO] GTDB search result was written to GCA_009928075.1_ASM992807v1_genomic.fna/result_gtdb.tsv [2023-06-16 20:51:11,208] [INFO] ===== GTDB Search completed ===== [2023-06-16 20:51:11,212] [INFO] DFAST_QC result json was written to GCA_009928075.1_ASM992807v1_genomic.fna/dqc_result.json [2023-06-16 20:51:11,213] [INFO] DFAST_QC completed! [2023-06-16 20:51:11,213] [INFO] Total running time: 0h0m53s