[2023-06-28 21:58:15,655] [INFO] DFAST_QC pipeline started. [2023-06-28 21:58:15,664] [INFO] DFAST_QC version: 0.5.7 [2023-06-28 21:58:15,664] [INFO] DQC Reference Directory: /var/lib/cwl/stg597bcd12-5f47-4f97-af37-c97fc1976646/dqc_reference [2023-06-28 21:58:16,911] [INFO] ===== Start taxonomy check using ANI ===== [2023-06-28 21:58:16,917] [INFO] Task started: Prodigal [2023-06-28 21:58:16,917] [INFO] Running command: gunzip -c /var/lib/cwl/stg605b1c85-0641-46f8-96f3-988d415531e1/GCA_913778485.1_SP287_3_metabat2_genome_mining.19_cleaned_genomic.fna.gz | prodigal -d GCA_913778485.1_SP287_3_metabat2_genome_mining.19_cleaned_genomic.fna/cds.fna -a GCA_913778485.1_SP287_3_metabat2_genome_mining.19_cleaned_genomic.fna/protein.faa -g 11 -q > /dev/null [2023-06-28 21:58:24,746] [INFO] Task succeeded: Prodigal [2023-06-28 21:58:24,747] [INFO] Task started: HMMsearch [2023-06-28 21:58:24,747] [INFO] Running command: hmmsearch --tblout GCA_913778485.1_SP287_3_metabat2_genome_mining.19_cleaned_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg597bcd12-5f47-4f97-af37-c97fc1976646/dqc_reference/reference_markers.hmm GCA_913778485.1_SP287_3_metabat2_genome_mining.19_cleaned_genomic.fna/protein.faa > /dev/null [2023-06-28 21:58:24,993] [INFO] Task succeeded: HMMsearch [2023-06-28 21:58:24,994] [WARNING] Found 5/6 markers. [/var/lib/cwl/stg605b1c85-0641-46f8-96f3-988d415531e1/GCA_913778485.1_SP287_3_metabat2_genome_mining.19_cleaned_genomic.fna.gz] [2023-06-28 21:58:25,032] [INFO] Query marker FASTA was written to GCA_913778485.1_SP287_3_metabat2_genome_mining.19_cleaned_genomic.fna/markers.fasta [2023-06-28 21:58:25,032] [INFO] Task started: Blastn [2023-06-28 21:58:25,032] [INFO] Running command: blastn -query GCA_913778485.1_SP287_3_metabat2_genome_mining.19_cleaned_genomic.fna/markers.fasta -db /var/lib/cwl/stg597bcd12-5f47-4f97-af37-c97fc1976646/dqc_reference/reference_markers.fasta -out GCA_913778485.1_SP287_3_metabat2_genome_mining.19_cleaned_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-28 21:58:25,845] [INFO] Task succeeded: Blastn [2023-06-28 21:58:25,849] [INFO] Selected 20 target genomes. [2023-06-28 21:58:25,850] [INFO] Target genome list was writen to GCA_913778485.1_SP287_3_metabat2_genome_mining.19_cleaned_genomic.fna/target_genomes.txt [2023-06-28 21:58:25,884] [INFO] Task started: fastANI [2023-06-28 21:58:25,884] [INFO] Running command: fastANI --query /var/lib/cwl/stg605b1c85-0641-46f8-96f3-988d415531e1/GCA_913778485.1_SP287_3_metabat2_genome_mining.19_cleaned_genomic.fna.gz --refList GCA_913778485.1_SP287_3_metabat2_genome_mining.19_cleaned_genomic.fna/target_genomes.txt --output GCA_913778485.1_SP287_3_metabat2_genome_mining.19_cleaned_genomic.fna/fastani_result.tsv --threads 1 [2023-06-28 21:58:40,515] [INFO] Task succeeded: fastANI [2023-06-28 21:58:40,515] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg597bcd12-5f47-4f97-af37-c97fc1976646/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2023-06-28 21:58:40,516] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg597bcd12-5f47-4f97-af37-c97fc1976646/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2023-06-28 21:58:40,532] [INFO] Found 20 fastANI hits (1 hits with ANI > threshold) [2023-06-28 21:58:40,533] [INFO] The taxonomy check result is classified as 'conclusive'. [2023-06-28 21:58:40,533] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Novosphingobium fluoreni strain=DSM 27568 GCA_014196615.1 1391222 1391222 type True 97.0214 426 742 95 conclusive Novosphingobium lindaniclasticum strain=LE124 GCA_000445125.1 1329895 1329895 type True 79.2411 193 742 95 below_threshold Novosphingobium colocasiae strain=KCTC 32255 GCA_014652555.1 1256513 1256513 type True 79.1375 197 742 95 below_threshold Novosphingobium barchaimii strain=LL02 GCA_001046635.1 1420591 1420591 type True 79.0925 221 742 95 below_threshold Novosphingobium pentaromativorans strain=US6-1 GCA_000235975.2 205844 205844 type True 79.0629 178 742 95 below_threshold Novosphingobium pentaromativorans strain=US6-1 GCA_000767465.1 205844 205844 type True 79.0003 183 742 95 below_threshold Novosphingobium guangzhouense strain=SA925 GCA_002896965.1 1850347 1850347 type True 78.922 214 742 95 below_threshold Novosphingobium silvae strain=FGD1 GCA_009856825.1 2692619 2692619 type True 78.8878 197 742 95 below_threshold Novosphingobium indicum strain=CGMCC 1.6784 GCA_014645195.1 462949 462949 type True 78.8872 173 742 95 below_threshold Novosphingobium mathurense strain=SM117 GCA_900168325.1 428990 428990 type True 78.828 184 742 95 below_threshold Novosphingobium decolorationis strain=502str22 GCA_018417475.1 2698673 2698673 type True 78.8151 175 742 95 below_threshold Novosphingobium arvoryzae strain=KCTC 32422 GCA_014652615.1 1256514 1256514 type True 78.6937 165 742 95 below_threshold Novosphingobium percolationis strain=c1 GCA_020179425.1 2871811 2871811 type True 78.6586 174 742 95 below_threshold Novosphingobium aquimarinum strain=M24A2M GCA_009746585.1 2682494 2682494 type True 78.6554 181 742 95 below_threshold Novosphingobium aureum strain=YJ-S2-02 GCA_015865035.1 2792964 2792964 type True 78.6518 182 742 95 below_threshold Novosphingobium lentum strain=NBRC 107847 GCA_001590965.1 145287 145287 type True 78.4078 178 742 95 below_threshold Novosphingobium huizhouense strain=c7 GCA_020179475.1 2866625 2866625 type True 78.2986 169 742 95 below_threshold Erythrobacter colymbi strain=JCM 18338 GCA_002155685.1 1161202 1161202 type True 77.5944 131 742 95 below_threshold Sphingomonas gei strain=ZFGT-11 GCA_004792685.1 1395960 1395960 type True 76.9998 109 742 95 below_threshold Sphingomonas baiyangensis strain=L-1-4 w-11 GCA_005144715.1 2572576 2572576 type True 76.9601 114 742 95 below_threshold -------------------------------------------------------------------------------- [2023-06-28 21:58:40,535] [INFO] DFAST Taxonomy check result was written to GCA_913778485.1_SP287_3_metabat2_genome_mining.19_cleaned_genomic.fna/tc_result.tsv [2023-06-28 21:58:40,536] [INFO] ===== Taxonomy check completed ===== [2023-06-28 21:58:40,536] [INFO] ===== Start completeness check using CheckM ===== [2023-06-28 21:58:40,536] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg597bcd12-5f47-4f97-af37-c97fc1976646/dqc_reference/checkm_data [2023-06-28 21:58:40,538] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2023-06-28 21:58:40,585] [INFO] Task started: CheckM [2023-06-28 21:58:40,586] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCA_913778485.1_SP287_3_metabat2_genome_mining.19_cleaned_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCA_913778485.1_SP287_3_metabat2_genome_mining.19_cleaned_genomic.fna/checkm_input GCA_913778485.1_SP287_3_metabat2_genome_mining.19_cleaned_genomic.fna/checkm_result [2023-06-28 21:59:08,512] [INFO] Task succeeded: CheckM [2023-06-28 21:59:08,514] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 58.33% Contamintation: 0.38% Strain heterogeneity: 100.00% -------------------------------------------------------------------------------- [2023-06-28 21:59:08,538] [INFO] ===== Completeness check finished ===== [2023-06-28 21:59:08,538] [INFO] ===== Start GTDB Search ===== [2023-06-28 21:59:08,539] [INFO] Query marker FASTA already exists. Will reuse it. (GCA_913778485.1_SP287_3_metabat2_genome_mining.19_cleaned_genomic.fna/markers.fasta) [2023-06-28 21:59:08,539] [INFO] Task started: Blastn [2023-06-28 21:59:08,539] [INFO] Running command: blastn -query GCA_913778485.1_SP287_3_metabat2_genome_mining.19_cleaned_genomic.fna/markers.fasta -db /var/lib/cwl/stg597bcd12-5f47-4f97-af37-c97fc1976646/dqc_reference/reference_markers_gtdb.fasta -out GCA_913778485.1_SP287_3_metabat2_genome_mining.19_cleaned_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-28 21:59:09,830] [INFO] Task succeeded: Blastn [2023-06-28 21:59:09,835] [INFO] Selected 14 target genomes. [2023-06-28 21:59:09,835] [INFO] Target genome list was writen to GCA_913778485.1_SP287_3_metabat2_genome_mining.19_cleaned_genomic.fna/target_genomes_gtdb.txt [2023-06-28 21:59:09,848] [INFO] Task started: fastANI [2023-06-28 21:59:09,848] [INFO] Running command: fastANI --query /var/lib/cwl/stg605b1c85-0641-46f8-96f3-988d415531e1/GCA_913778485.1_SP287_3_metabat2_genome_mining.19_cleaned_genomic.fna.gz --refList GCA_913778485.1_SP287_3_metabat2_genome_mining.19_cleaned_genomic.fna/target_genomes_gtdb.txt --output GCA_913778485.1_SP287_3_metabat2_genome_mining.19_cleaned_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2023-06-28 21:59:20,520] [INFO] Task succeeded: fastANI [2023-06-28 21:59:20,540] [INFO] Found 14 fastANI hits (1 hits with ANI > circumscription radius) [2023-06-28 21:59:20,540] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCF_014196615.1 s__Novosphingobium fluoreni 97.0224 426 742 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Sphingomonadales;f__Sphingomonadaceae;g__Novosphingobium 95.0 97.16 97.16 0.90 0.90 2 conclusive GCF_902506425.1 s__Novosphingobium sp902506425 81.4392 277 742 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Sphingomonadales;f__Sphingomonadaceae;g__Novosphingobium 95.0 N/A N/A N/A N/A 1 - GCF_014204535.1 s__Novosphingobium chloroacetimidivorans 81.4036 286 742 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Sphingomonadales;f__Sphingomonadaceae;g__Novosphingobium 95.0 N/A N/A N/A N/A 1 - GCF_000272475.1 s__Novosphingobium sp000272475 79.2692 206 742 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Sphingomonadales;f__Sphingomonadaceae;g__Novosphingobium 95.0 N/A N/A N/A N/A 1 - GCF_000281975.1 s__Novosphingobium sp000281975 79.2424 209 742 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Sphingomonadales;f__Sphingomonadaceae;g__Novosphingobium 95.0 N/A N/A N/A N/A 1 - GCF_006874585.1 s__Novosphingobium sp006874585 79.1837 216 742 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Sphingomonadales;f__Sphingomonadaceae;g__Novosphingobium 95.0 98.32 98.32 0.81 0.81 2 - GCF_001742225.1 s__Novosphingobium resinovorum 79.1208 232 742 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Sphingomonadales;f__Sphingomonadaceae;g__Novosphingobium 95.0 98.22 97.83 0.80 0.78 4 - GCF_001046635.1 s__Novosphingobium barchaimii 79.073 222 742 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Sphingomonadales;f__Sphingomonadaceae;g__Novosphingobium 95.0 N/A N/A N/A N/A 1 - GCF_002556635.1 s__Novosphingobium sp002556635 78.95 195 742 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Sphingomonadales;f__Sphingomonadaceae;g__Novosphingobium 95.0 N/A N/A N/A N/A 1 - GCF_014645195.1 s__Novosphingobium indicum 78.8622 174 742 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Sphingomonadales;f__Sphingomonadaceae;g__Novosphingobium 95.0 97.92 97.92 0.77 0.77 2 - GCA_004211435.1 s__Novosphingobium sp004211435 78.7774 168 742 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Sphingomonadales;f__Sphingomonadaceae;g__Novosphingobium 95.0 N/A N/A N/A N/A 1 - GCF_019075875.1 s__Novosphingobium sp019075875 78.2934 194 742 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Sphingomonadales;f__Sphingomonadaceae;g__Novosphingobium 95.0 N/A N/A N/A N/A 1 - GCF_002155685.1 s__Erythrobacter colymbi 77.6149 130 742 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Sphingomonadales;f__Sphingomonadaceae;g__Erythrobacter 95.0 N/A N/A N/A N/A 1 - GCF_014641655.1 s__Alteriqipengyuania_A marina 77.5993 99 742 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Sphingomonadales;f__Sphingomonadaceae;g__Alteriqipengyuania_A 95.0 98.02 98.02 0.89 0.89 2 - -------------------------------------------------------------------------------- [2023-06-28 21:59:20,542] [INFO] GTDB search result was written to GCA_913778485.1_SP287_3_metabat2_genome_mining.19_cleaned_genomic.fna/result_gtdb.tsv [2023-06-28 21:59:20,542] [INFO] ===== GTDB Search completed ===== [2023-06-28 21:59:20,546] [INFO] DFAST_QC result json was written to GCA_913778485.1_SP287_3_metabat2_genome_mining.19_cleaned_genomic.fna/dqc_result.json [2023-06-28 21:59:20,547] [INFO] DFAST_QC completed! [2023-06-28 21:59:20,547] [INFO] Total running time: 0h1m5s