[2023-06-18 21:22:02,421] [INFO] DFAST_QC pipeline started. [2023-06-18 21:22:02,423] [INFO] DFAST_QC version: 0.5.7 [2023-06-18 21:22:02,423] [INFO] DQC Reference Directory: /var/lib/cwl/stg6536e58a-3d4a-48a8-8315-d35910af5e9e/dqc_reference [2023-06-18 21:22:04,570] [INFO] ===== Start taxonomy check using ANI ===== [2023-06-18 21:22:04,571] [INFO] Task started: Prodigal [2023-06-18 21:22:04,572] [INFO] Running command: gunzip -c /var/lib/cwl/stgd3948cfc-63ea-469a-889e-2e1ce64c1676/GCA_018402035.1_ASM1840203v1_genomic.fna.gz | prodigal -d GCA_018402035.1_ASM1840203v1_genomic.fna/cds.fna -a GCA_018402035.1_ASM1840203v1_genomic.fna/protein.faa -g 11 -q > /dev/null [2023-06-18 21:22:14,833] [INFO] Task succeeded: Prodigal [2023-06-18 21:22:14,834] [INFO] Task started: HMMsearch [2023-06-18 21:22:14,834] [INFO] Running command: hmmsearch --tblout GCA_018402035.1_ASM1840203v1_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg6536e58a-3d4a-48a8-8315-d35910af5e9e/dqc_reference/reference_markers.hmm GCA_018402035.1_ASM1840203v1_genomic.fna/protein.faa > /dev/null [2023-06-18 21:22:15,084] [INFO] Task succeeded: HMMsearch [2023-06-18 21:22:15,086] [INFO] Found 6/6 markers. [2023-06-18 21:22:15,120] [INFO] Query marker FASTA was written to GCA_018402035.1_ASM1840203v1_genomic.fna/markers.fasta [2023-06-18 21:22:15,121] [INFO] Task started: Blastn [2023-06-18 21:22:15,121] [INFO] Running command: blastn -query GCA_018402035.1_ASM1840203v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg6536e58a-3d4a-48a8-8315-d35910af5e9e/dqc_reference/reference_markers.fasta -out GCA_018402035.1_ASM1840203v1_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-18 21:22:15,973] [INFO] Task succeeded: Blastn [2023-06-18 21:22:15,977] [INFO] Selected 26 target genomes. [2023-06-18 21:22:15,977] [INFO] Target genome list was writen to GCA_018402035.1_ASM1840203v1_genomic.fna/target_genomes.txt [2023-06-18 21:22:15,980] [INFO] Task started: fastANI [2023-06-18 21:22:15,980] [INFO] Running command: fastANI --query /var/lib/cwl/stgd3948cfc-63ea-469a-889e-2e1ce64c1676/GCA_018402035.1_ASM1840203v1_genomic.fna.gz --refList GCA_018402035.1_ASM1840203v1_genomic.fna/target_genomes.txt --output GCA_018402035.1_ASM1840203v1_genomic.fna/fastani_result.tsv --threads 1 [2023-06-18 21:22:32,830] [INFO] Task succeeded: fastANI [2023-06-18 21:22:32,831] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg6536e58a-3d4a-48a8-8315-d35910af5e9e/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2023-06-18 21:22:32,831] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg6536e58a-3d4a-48a8-8315-d35910af5e9e/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2023-06-18 21:22:32,855] [INFO] Found 26 fastANI hits (0 hits with ANI > threshold) [2023-06-18 21:22:32,855] [INFO] The taxonomy check result is classified as 'below_threshold'. [2023-06-18 21:22:32,855] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Roseibacterium elongatum strain=DFL-43 GCA_000590925.1 159346 159346 type True 78.7318 452 962 95 below_threshold Rhodovulum tesquicola strain=A-36s GCA_024128855.1 540254 540254 type True 77.9901 272 962 95 below_threshold Ruegeria pomeroyi strain=DSS-3 GCA_000011965.2 89184 89184 suspected-type True 77.8388 301 962 95 below_threshold Nioella ostreopsis strain=Z7-4 GCA_004000255.1 2448479 2448479 type True 77.7417 311 962 95 below_threshold Nioella nitratireducens strain=SSW136 GCA_001879715.1 1287720 1287720 type True 77.6446 301 962 95 below_threshold Rhodovulum bhavnagarense strain=DSM 24766 GCA_004343505.1 992286 992286 type True 77.6431 209 962 95 below_threshold Rhodovulum marinum strain=DSM 18063 GCA_004343075.1 320662 320662 type True 77.5305 276 962 95 below_threshold Roseovarius litoreus strain=DSM 28249 GCA_900142765.1 1155722 1155722 type True 77.5121 268 962 95 below_threshold Limimaricola hongkongensis strain=UST950701-009P GCA_000365005.1 278132 278132 type True 77.4733 241 962 95 below_threshold Nioella sediminis strain=JS7-11 GCA_001879695.1 1912092 1912092 type True 77.4693 316 962 95 below_threshold Rhodophyticola porphyridii strain=MA-7-27 GCA_003688285.1 1852017 1852017 type True 77.38 318 962 95 below_threshold Salipiger marinus strain=DSM 26424 GCA_900100085.1 555512 555512 type True 77.3541 306 962 95 below_threshold Rhabdonatronobacter sediminivivens strain=IM2376 GCA_013415485.1 2743469 2743469 type True 77.2411 232 962 95 below_threshold Roseicitreum antarcticum strain=ZS2-28 GCA_014681765.1 564137 564137 type True 77.2125 218 962 95 below_threshold Pelagivirga dicentrarchi strain=YLY04 GCA_003316635.1 2250573 2250573 type True 77.198 203 962 95 below_threshold Roseovarius pacificus strain=CGMCC 1.7083 GCA_014645335.1 337701 337701 type True 77.1408 247 962 95 below_threshold Thalassobius mangrovi strain=GS-10 GCA_009857745.1 2692236 2692236 type True 77.1321 258 962 95 below_threshold Pseudooceanicola nanhaiensis strain=CGMCC 1.6293 GCA_014645095.1 375761 375761 type True 77.1246 206 962 95 below_threshold Pseudooceanicola aestuarii strain=E2-1 GCA_010614805.1 2697319 2697319 type True 77.1114 205 962 95 below_threshold Pseudooceanicola nanhaiensis strain=DSM 18065 GCA_000688295.1 375761 375761 type True 77.1071 203 962 95 below_threshold Roseovarius faecimaris strain=MME-070 GCA_009762325.1 2494550 2494550 type True 77.088 226 962 95 below_threshold Roseovarius halotolerans strain=DSM 29507 GCA_003634925.1 505353 505353 type True 77.0856 244 962 95 below_threshold Cereibacter sediminicola strain=JA983 GCA_007668225.1 2584941 2584941 type True 76.9969 200 962 95 below_threshold Pseudosulfitobacter pseudonitzschiae strain=DSM 26824 GCA_900129395.1 1402135 1402135 type True 76.8828 202 962 95 below_threshold Pseudosulfitobacter pseudonitzschiae strain=H3 GCA_000712315.1 1402135 1402135 type True 76.8402 201 962 95 below_threshold Allosediminivita pacifica strain=CGMCC 1.12410 GCA_014637495.1 1267769 1267769 type True 76.6614 170 962 95 below_threshold -------------------------------------------------------------------------------- [2023-06-18 21:22:32,859] [INFO] DFAST Taxonomy check result was written to GCA_018402035.1_ASM1840203v1_genomic.fna/tc_result.tsv [2023-06-18 21:22:32,860] [INFO] ===== Taxonomy check completed ===== [2023-06-18 21:22:32,860] [INFO] ===== Start completeness check using CheckM ===== [2023-06-18 21:22:32,860] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg6536e58a-3d4a-48a8-8315-d35910af5e9e/dqc_reference/checkm_data [2023-06-18 21:22:32,861] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2023-06-18 21:22:32,894] [INFO] Task started: CheckM [2023-06-18 21:22:32,895] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCA_018402035.1_ASM1840203v1_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCA_018402035.1_ASM1840203v1_genomic.fna/checkm_input GCA_018402035.1_ASM1840203v1_genomic.fna/checkm_result [2023-06-18 21:23:06,949] [INFO] Task succeeded: CheckM [2023-06-18 21:23:06,951] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 98.11% Contamintation: 0.00% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2023-06-18 21:23:06,978] [INFO] ===== Completeness check finished ===== [2023-06-18 21:23:06,978] [INFO] ===== Start GTDB Search ===== [2023-06-18 21:23:06,979] [INFO] Query marker FASTA already exists. Will reuse it. (GCA_018402035.1_ASM1840203v1_genomic.fna/markers.fasta) [2023-06-18 21:23:06,979] [INFO] Task started: Blastn [2023-06-18 21:23:06,979] [INFO] Running command: blastn -query GCA_018402035.1_ASM1840203v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg6536e58a-3d4a-48a8-8315-d35910af5e9e/dqc_reference/reference_markers_gtdb.fasta -out GCA_018402035.1_ASM1840203v1_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-18 21:23:08,501] [INFO] Task succeeded: Blastn [2023-06-18 21:23:08,508] [INFO] Selected 9 target genomes. [2023-06-18 21:23:08,508] [INFO] Target genome list was writen to GCA_018402035.1_ASM1840203v1_genomic.fna/target_genomes_gtdb.txt [2023-06-18 21:23:08,516] [INFO] Task started: fastANI [2023-06-18 21:23:08,517] [INFO] Running command: fastANI --query /var/lib/cwl/stgd3948cfc-63ea-469a-889e-2e1ce64c1676/GCA_018402035.1_ASM1840203v1_genomic.fna.gz --refList GCA_018402035.1_ASM1840203v1_genomic.fna/target_genomes_gtdb.txt --output GCA_018402035.1_ASM1840203v1_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2023-06-18 21:23:14,574] [INFO] Task succeeded: fastANI [2023-06-18 21:23:14,586] [INFO] Found 9 fastANI hits (1 hits with ANI > circumscription radius) [2023-06-18 21:23:14,586] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCA_018402035.1 s__Roseicyclus sp018402035 100.0 961 962 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Roseicyclus 95.0 N/A N/A N/A N/A 1 conclusive GCA_015689735.1 s__Roseicyclus sp015689735 85.307 768 962 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Roseicyclus 95.0 N/A N/A N/A N/A 1 - GCA_018401965.1 s__Roseicyclus sp018401965 80.8323 520 962 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Roseicyclus 95.0 N/A N/A N/A N/A 1 - GCF_003148775.1 s__Roseicyclus mahoneyensis 80.3628 602 962 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Roseicyclus 95.0 N/A N/A N/A N/A 1 - GCA_018401055.1 s__Roseicyclus sp018401055 80.0568 491 962 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Roseicyclus 95.0 N/A N/A N/A N/A 1 - GCF_012395815.1 s__Roseicyclus sp012395815 79.3183 553 962 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Roseicyclus 95.0 N/A N/A N/A N/A 1 - GCA_015689745.1 s__Roseicyclus sp015689745 79.3174 510 962 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Roseicyclus 95.0 N/A N/A N/A N/A 1 - GCA_014359765.1 s__Roseovarius sp014359765 77.6479 248 962 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Roseovarius 95.0 N/A N/A N/A N/A 1 - GCF_003651245.1 s__Roseovarius spongiae 76.9152 209 962 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Roseovarius 95.0 N/A N/A N/A N/A 1 - -------------------------------------------------------------------------------- [2023-06-18 21:23:14,589] [INFO] GTDB search result was written to GCA_018402035.1_ASM1840203v1_genomic.fna/result_gtdb.tsv [2023-06-18 21:23:14,589] [INFO] ===== GTDB Search completed ===== [2023-06-18 21:23:14,594] [INFO] DFAST_QC result json was written to GCA_018402035.1_ASM1840203v1_genomic.fna/dqc_result.json [2023-06-18 21:23:14,594] [INFO] DFAST_QC completed! [2023-06-18 21:23:14,595] [INFO] Total running time: 0h1m12s