[2023-03-15 17:39:54,920] [INFO] DFAST_QC pipeline started. [2023-03-15 17:39:54,920] [INFO] DFAST_QC version: 0.5.7 [2023-03-15 17:39:54,920] [INFO] DQC Reference Directory: /var/lib/cwl/stgaf83e5ed-5d1a-4d5c-8968-2e245cd10363/dqc_reference [2023-03-15 17:39:57,554] [INFO] ===== Start taxonomy check using ANI ===== [2023-03-15 17:39:57,554] [INFO] Task started: Prodigal [2023-03-15 17:39:57,554] [INFO] Running command: cat /var/lib/cwl/stg8e4ef203-ea95-42e6-b5b6-6666fbc9155f/OceanDNA-b23834.fa | prodigal -d OceanDNA-b23834/cds.fna -a OceanDNA-b23834/protein.faa -g 11 -q > /dev/null [2023-03-15 17:40:14,784] [INFO] Task succeeded: Prodigal [2023-03-15 17:40:14,784] [INFO] Task started: HMMsearch [2023-03-15 17:40:14,785] [INFO] Running command: hmmsearch --tblout OceanDNA-b23834/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stgaf83e5ed-5d1a-4d5c-8968-2e245cd10363/dqc_reference/reference_markers.hmm OceanDNA-b23834/protein.faa > /dev/null [2023-03-15 17:40:14,970] [INFO] Task succeeded: HMMsearch [2023-03-15 17:40:14,970] [INFO] Found 6/6 markers. [2023-03-15 17:40:14,989] [INFO] Query marker FASTA was written to OceanDNA-b23834/markers.fasta [2023-03-15 17:40:14,989] [INFO] Task started: Blastn [2023-03-15 17:40:14,989] [INFO] Running command: blastn -query OceanDNA-b23834/markers.fasta -db /var/lib/cwl/stgaf83e5ed-5d1a-4d5c-8968-2e245cd10363/dqc_reference/reference_markers.fasta -out OceanDNA-b23834/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-03-15 17:40:15,826] [INFO] Task succeeded: Blastn [2023-03-15 17:40:15,827] [INFO] Selected 22 target genomes. [2023-03-15 17:40:15,828] [INFO] Target genome list was writen to OceanDNA-b23834/target_genomes.txt [2023-03-15 17:40:15,841] [INFO] Task started: fastANI [2023-03-15 17:40:15,841] [INFO] Running command: fastANI --query /var/lib/cwl/stg8e4ef203-ea95-42e6-b5b6-6666fbc9155f/OceanDNA-b23834.fa --refList OceanDNA-b23834/target_genomes.txt --output OceanDNA-b23834/fastani_result.tsv --threads 1 [2023-03-15 17:40:29,866] [INFO] Task succeeded: fastANI [2023-03-15 17:40:29,866] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stgaf83e5ed-5d1a-4d5c-8968-2e245cd10363/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2023-03-15 17:40:29,867] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stgaf83e5ed-5d1a-4d5c-8968-2e245cd10363/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2023-03-15 17:40:29,880] [INFO] Found 21 fastANI hits (0 hits with ANI > threshold) [2023-03-15 17:40:29,880] [INFO] The taxonomy check result is classified as 'below_threshold'. [2023-03-15 17:40:29,880] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Maricaulis maris strain=DSM 4734 GCA_003634045.1 74318 74318 type True 79.2149 433 920 95 below_threshold Maricaulis alexandrii strain=LZ-16-1 GCA_005871165.1 2570354 2570354 type True 79.1727 484 920 95 below_threshold Maricaulis salignorans strain=DSM 16077 GCA_900103475.1 144026 144026 type True 78.593 393 920 95 below_threshold Marinicauda pacifica strain=P-1 km-3 GCA_004793635.1 1133559 1133559 type True 77.2316 182 920 95 below_threshold Marinicauda pacifica strain=CGMCC 1.11031 GCA_014636415.1 1133559 1133559 type True 77.2159 183 920 95 below_threshold Marinicauda algicola strain=RMAR8-3 GCA_017161425.1 2029849 2029849 type True 77.1635 193 920 95 below_threshold Marinicauda algicola strain=JCM 31718 GCA_004793685.1 2029849 2029849 type True 77.0809 194 920 95 below_threshold Marinicauda salina strain=WD6-1 GCA_003122085.1 2135793 2135793 type True 77.0724 190 920 95 below_threshold Glycocaulis albus strain=CGMCC 1.12766 GCA_014639075.1 1382801 1382801 type True 76.8452 158 920 95 below_threshold Pyruvatibacter mobilis strain=GYP-11 GCA_009910475.1 1712261 1712261 type True 76.7764 74 920 95 below_threshold Pyruvatibacter mobilis strain=CGMCC 1.15125 GCA_012848855.1 1712261 1712261 type True 76.7456 75 920 95 below_threshold Pyruvatibacter mobilis strain=CGMCC 1.15125 GCA_014640905.1 1712261 1712261 type True 76.7388 75 920 95 below_threshold Brevundimonas alba strain=DSM 4736 GCA_011927945.1 74314 74314 type True 76.3365 71 920 95 below_threshold Parvibaculum indicum strain=DSM 25305 GCA_011762095.1 562969 562969 type True 76.0636 74 920 95 below_threshold Aurantimonas endophytica strain=KCTC 52296 GCA_024105745.1 1522175 1522175 type True 76.0333 73 920 95 below_threshold Bradyrhizobium acaciae strain=10BB GCA_020889785.1 2683706 2683706 type True 75.9979 81 920 95 below_threshold Bradyrhizobium oropedii strain=Pear76 GCA_020889685.1 1571201 1571201 type True 75.9677 83 920 95 below_threshold Xanthobacter aminoxidans strain=ATCC BAA-299 GCA_023571765.1 186280 186280 type True 75.7495 82 920 95 below_threshold Xanthobacter oligotrophicus strain=29k GCA_008364685.1 2607286 2607286 type True 75.7394 73 920 95 below_threshold Sphingomonas suaedae strain=XS-10 GCA_007833215.1 2599297 2599297 type True 75.6579 56 920 95 below_threshold Roseomonas rubea strain=MO17 GCA_016106015.1 2748666 2748666 type True 75.4203 54 920 95 below_threshold -------------------------------------------------------------------------------- [2023-03-15 17:40:29,880] [INFO] DFAST Taxonomy check result was written to OceanDNA-b23834/tc_result.tsv [2023-03-15 17:40:29,880] [INFO] ===== Taxonomy check completed ===== [2023-03-15 17:40:29,880] [INFO] ===== Start completeness check using CheckM ===== [2023-03-15 17:40:29,881] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stgaf83e5ed-5d1a-4d5c-8968-2e245cd10363/dqc_reference/checkm_data [2023-03-15 17:40:29,881] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2023-03-15 17:40:30,002] [INFO] Task started: CheckM [2023-03-15 17:40:30,002] [INFO] Running command: checkm taxonomy_wf --tab_table -f OceanDNA-b23834/cc_result.tsv -t 1 life "Prokaryote" OceanDNA-b23834/checkm_input OceanDNA-b23834/checkm_result [2023-03-15 17:41:14,986] [INFO] Task succeeded: CheckM [2023-03-15 17:41:14,987] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 76.14% Contamintation: 0.00% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2023-03-15 17:41:14,989] [INFO] ===== Completeness check finished ===== [2023-03-15 17:41:14,989] [INFO] ===== Start GTDB Search ===== [2023-03-15 17:41:14,989] [INFO] Query marker FASTA already exists. Will reuse it. (OceanDNA-b23834/markers.fasta) [2023-03-15 17:41:14,991] [INFO] Task started: Blastn [2023-03-15 17:41:14,991] [INFO] Running command: blastn -query OceanDNA-b23834/markers.fasta -db /var/lib/cwl/stgaf83e5ed-5d1a-4d5c-8968-2e245cd10363/dqc_reference/reference_markers_gtdb.fasta -out OceanDNA-b23834/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-03-15 17:41:16,641] [INFO] Task succeeded: Blastn [2023-03-15 17:41:16,642] [INFO] Selected 15 target genomes. [2023-03-15 17:41:16,642] [INFO] Target genome list was writen to OceanDNA-b23834/target_genomes_gtdb.txt [2023-03-15 17:41:16,946] [INFO] Task started: fastANI [2023-03-15 17:41:16,946] [INFO] Running command: fastANI --query /var/lib/cwl/stg8e4ef203-ea95-42e6-b5b6-6666fbc9155f/OceanDNA-b23834.fa --refList OceanDNA-b23834/target_genomes_gtdb.txt --output OceanDNA-b23834/fastani_result_gtdb.tsv --threads 1 [2023-03-15 17:41:26,030] [INFO] Task succeeded: fastANI [2023-03-15 17:41:26,039] [INFO] Found 13 fastANI hits (0 hits with ANI > circumscription radius) [2023-03-15 17:41:26,039] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCA_017643585.1 s__Maricaulis sp017643585 79.7719 469 920 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Maricaulaceae;g__Maricaulis 95.0 99.94 99.90 0.97 0.94 3 - GCA_017643475.1 s__Maricaulis sp017643475 79.5244 475 920 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Maricaulaceae;g__Maricaulis 95.0 N/A N/A N/A N/A 1 - GCA_017642925.1 s__Maricaulis sp017642925 79.334 460 920 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Maricaulaceae;g__Maricaulis 95.0 N/A N/A N/A N/A 1 - GCF_003634045.1 s__Maricaulis maris 79.2147 433 920 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Maricaulaceae;g__Maricaulis 95.0 98.04 98.04 0.96 0.96 2 - GCF_016817355.1 s__Maricaulis parjimensis 79.2119 454 920 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Maricaulaceae;g__Maricaulis 95.0 N/A N/A N/A N/A 1 - GCF_005871165.1 s__Maricaulis sp005871165 79.1727 484 920 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Maricaulaceae;g__Maricaulis 95.0 N/A N/A N/A N/A 1 - GCA_015665295.1 s__Maricaulis sp015665295 78.9415 445 920 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Maricaulaceae;g__Maricaulis 95.0 N/A N/A N/A N/A 1 - GCA_002694345.1 s__Maricaulis sp002694345 78.7395 308 920 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Maricaulaceae;g__Maricaulis 95.0 N/A N/A N/A N/A 1 - GCA_905480005.1 s__Maricaulis sp905480005 78.6385 403 920 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Maricaulaceae;g__Maricaulis 95.0 N/A N/A N/A N/A 1 - GCA_018222865.1 s__Maricaulis sp018222865 78.6241 393 920 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Maricaulaceae;g__Maricaulis 95.0 N/A N/A N/A N/A 1 - GCF_000420265.1 s__Oceanicaulis alexandrii 77.1446 177 920 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Maricaulaceae;g__Oceanicaulis 95.0 97.56 97.05 0.94 0.85 5 - GCA_017418605.1 s__CAISGS01 sp017418605 76.2259 117 920 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Caulobacteraceae;g__CAISGS01 95.0 N/A N/A N/A N/A 1 - GCA_016201015.1 s__JACQDQ01 sp016201015 75.2963 59 920 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__CACIAM-22H2;f__CACIAM-22H2;g__JACQDQ01 95.0 N/A N/A N/A N/A 1 - -------------------------------------------------------------------------------- [2023-03-15 17:41:26,039] [INFO] GTDB search result was written to OceanDNA-b23834/result_gtdb.tsv [2023-03-15 17:41:26,039] [INFO] ===== GTDB Search completed ===== [2023-03-15 17:41:26,041] [INFO] DFAST_QC result json was written to OceanDNA-b23834/dqc_result.json [2023-03-15 17:41:26,041] [INFO] DFAST_QC completed! [2023-03-15 17:41:26,041] [INFO] Total running time: 0h1m31s