[2024-01-24 12:54:39,468] [INFO] DFAST_QC pipeline started. [2024-01-24 12:54:39,471] [INFO] DFAST_QC version: 0.5.7 [2024-01-24 12:54:39,471] [INFO] DQC Reference Directory: /var/lib/cwl/stg817b8f93-8bc8-4bcf-9f73-acc05595ef56/dqc_reference [2024-01-24 12:54:40,990] [INFO] ===== Start taxonomy check using ANI ===== [2024-01-24 12:54:40,991] [INFO] Task started: Prodigal [2024-01-24 12:54:40,991] [INFO] Running command: gunzip -c /var/lib/cwl/stg8ad93383-e2cf-47d8-bc90-2d70de8d1af8/GCF_900129685.1_IMG-taxon_2617270829_annotated_assembly_genomic.fna.gz | prodigal -d GCF_900129685.1_IMG-taxon_2617270829_annotated_assembly_genomic.fna/cds.fna -a GCF_900129685.1_IMG-taxon_2617270829_annotated_assembly_genomic.fna/protein.faa -g 11 -q > /dev/null [2024-01-24 12:54:51,738] [INFO] Task succeeded: Prodigal [2024-01-24 12:54:51,738] [INFO] Task started: HMMsearch [2024-01-24 12:54:51,739] [INFO] Running command: hmmsearch --tblout GCF_900129685.1_IMG-taxon_2617270829_annotated_assembly_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg817b8f93-8bc8-4bcf-9f73-acc05595ef56/dqc_reference/reference_markers.hmm GCF_900129685.1_IMG-taxon_2617270829_annotated_assembly_genomic.fna/protein.faa > /dev/null [2024-01-24 12:54:52,018] [INFO] Task succeeded: HMMsearch [2024-01-24 12:54:52,022] [INFO] Found 6/6 markers. [2024-01-24 12:54:52,056] [INFO] Query marker FASTA was written to GCF_900129685.1_IMG-taxon_2617270829_annotated_assembly_genomic.fna/markers.fasta [2024-01-24 12:54:52,056] [INFO] Task started: Blastn [2024-01-24 12:54:52,057] [INFO] Running command: blastn -query GCF_900129685.1_IMG-taxon_2617270829_annotated_assembly_genomic.fna/markers.fasta -db /var/lib/cwl/stg817b8f93-8bc8-4bcf-9f73-acc05595ef56/dqc_reference/reference_markers.fasta -out GCF_900129685.1_IMG-taxon_2617270829_annotated_assembly_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-24 12:54:52,795] [INFO] Task succeeded: Blastn [2024-01-24 12:54:52,798] [INFO] Selected 23 target genomes. [2024-01-24 12:54:52,799] [INFO] Target genome list was writen to GCF_900129685.1_IMG-taxon_2617270829_annotated_assembly_genomic.fna/target_genomes.txt [2024-01-24 12:54:52,806] [INFO] Task started: fastANI [2024-01-24 12:54:52,806] [INFO] Running command: fastANI --query /var/lib/cwl/stg8ad93383-e2cf-47d8-bc90-2d70de8d1af8/GCF_900129685.1_IMG-taxon_2617270829_annotated_assembly_genomic.fna.gz --refList GCF_900129685.1_IMG-taxon_2617270829_annotated_assembly_genomic.fna/target_genomes.txt --output GCF_900129685.1_IMG-taxon_2617270829_annotated_assembly_genomic.fna/fastani_result.tsv --threads 1 [2024-01-24 12:55:09,095] [INFO] Task succeeded: fastANI [2024-01-24 12:55:09,095] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg817b8f93-8bc8-4bcf-9f73-acc05595ef56/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2024-01-24 12:55:09,095] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg817b8f93-8bc8-4bcf-9f73-acc05595ef56/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2024-01-24 12:55:09,112] [INFO] Found 23 fastANI hits (1 hits with ANI > threshold) [2024-01-24 12:55:09,113] [INFO] The taxonomy check result is classified as 'conclusive'. [2024-01-24 12:55:09,113] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Cognatishimia maritima strain=DSM 28223 GCA_900129685.1 870908 870908 type True 100.0 1083 1089 95 conclusive Cognatishimia activa strain=CECT 5113 GCA_001458335.1 1715691 1715691 type True 79.7565 437 1089 95 below_threshold Marivivens aquimaris strain=GSB7 GCA_015220045.1 2774876 2774876 type True 78.3807 139 1089 95 below_threshold Phaeobacter gallaeciensis strain=DSM 26640 GCA_000511385.1 60890 60890 type True 78.093 235 1089 95 below_threshold Phaeobacter piscinae strain=P14 GCA_002407245.1 1580596 1580596 type True 77.6804 237 1089 95 below_threshold Phaeobacter gallaeciensis strain=DSM 26640 GCA_000819625.1 60890 60890 type True 77.6562 225 1089 95 below_threshold Zongyanglinia marina strain=DSW4-44 GCA_005771405.1 2578117 2578117 type True 77.6416 143 1089 95 below_threshold Shimia sediminis strain=ZQ172 GCA_003990645.1 2497945 2497945 type True 77.6309 244 1089 95 below_threshold Falsiruegeria mediterranea strain=CECT 7615 GCA_900302455.1 1280832 1280832 type True 77.5377 210 1089 95 below_threshold Zongyanglinia huanghaiensis strain=CY05 GCA_009753675.1 2682100 2682100 type True 77.4478 148 1089 95 below_threshold Shimia haliotis strain=DSM 28453 GCA_900114415.1 1280847 1280847 type True 77.4192 260 1089 95 below_threshold Falsiruegeria litorea strain=CECT 7639 GCA_900172225.1 1280831 1280831 type True 77.4059 219 1089 95 below_threshold Shimia thalassica strain=CECT 7735 GCA_001458215.1 1715693 1715693 type True 77.2665 225 1089 95 below_threshold Roseovarius nubinhibens strain=ISM GCA_000152625.1 314263 314263 type True 77.2481 186 1089 95 below_threshold Tritonibacter scottomollicae strain=DSM 25328 GCA_003003215.1 483013 483013 type True 77.2115 214 1089 95 below_threshold Shimia abyssi strain=DSM 100673 GCA_003014475.1 1662395 1662395 type True 77.1727 221 1089 95 below_threshold Marivivens niveibacter strain=MCCC 1A06712 GCA_002150005.2 1930667 1930667 type True 77.1372 142 1089 95 below_threshold Ruegeria marisrubri strain=ZGT118 GCA_001507595.1 1685379 1685379 type True 77.0292 192 1089 95 below_threshold Aliiroseovarius halocynthiae strain=MA1-10 GCA_007004645.1 985055 985055 type True 77.0083 159 1089 95 below_threshold Pseudorhodobacter aquimaris strain=KCTC 23043 GCA_001202025.1 687412 687412 type True 76.9834 107 1089 95 below_threshold Pseudophaeobacter flagellatus strain=MA21411-1 GCA_021228235.1 2899119 2899119 type True 76.9318 201 1089 95 below_threshold Chachezhania sediminis strain=CAU 1508 GCA_009765275.1 2599291 2599291 type True 76.9262 121 1089 95 below_threshold Rhabdonatronobacter sediminivivens strain=IM2376 GCA_013415485.1 2743469 2743469 type True 76.3418 92 1089 95 below_threshold -------------------------------------------------------------------------------- [2024-01-24 12:55:09,115] [INFO] DFAST Taxonomy check result was written to GCF_900129685.1_IMG-taxon_2617270829_annotated_assembly_genomic.fna/tc_result.tsv [2024-01-24 12:55:09,115] [INFO] ===== Taxonomy check completed ===== [2024-01-24 12:55:09,115] [INFO] ===== Start completeness check using CheckM ===== [2024-01-24 12:55:09,115] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg817b8f93-8bc8-4bcf-9f73-acc05595ef56/dqc_reference/checkm_data [2024-01-24 12:55:09,117] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2024-01-24 12:55:09,155] [INFO] Task started: CheckM [2024-01-24 12:55:09,155] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCF_900129685.1_IMG-taxon_2617270829_annotated_assembly_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCF_900129685.1_IMG-taxon_2617270829_annotated_assembly_genomic.fna/checkm_input GCF_900129685.1_IMG-taxon_2617270829_annotated_assembly_genomic.fna/checkm_result [2024-01-24 12:55:44,432] [INFO] Task succeeded: CheckM [2024-01-24 12:55:44,433] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 100.00% Contamintation: 0.00% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2024-01-24 12:55:44,456] [INFO] ===== Completeness check finished ===== [2024-01-24 12:55:44,456] [INFO] ===== Start GTDB Search ===== [2024-01-24 12:55:44,457] [INFO] Query marker FASTA already exists. Will reuse it. (GCF_900129685.1_IMG-taxon_2617270829_annotated_assembly_genomic.fna/markers.fasta) [2024-01-24 12:55:44,457] [INFO] Task started: Blastn [2024-01-24 12:55:44,457] [INFO] Running command: blastn -query GCF_900129685.1_IMG-taxon_2617270829_annotated_assembly_genomic.fna/markers.fasta -db /var/lib/cwl/stg817b8f93-8bc8-4bcf-9f73-acc05595ef56/dqc_reference/reference_markers_gtdb.fasta -out GCF_900129685.1_IMG-taxon_2617270829_annotated_assembly_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-24 12:55:45,443] [INFO] Task succeeded: Blastn [2024-01-24 12:55:45,447] [INFO] Selected 21 target genomes. [2024-01-24 12:55:45,447] [INFO] Target genome list was writen to GCF_900129685.1_IMG-taxon_2617270829_annotated_assembly_genomic.fna/target_genomes_gtdb.txt [2024-01-24 12:55:45,464] [INFO] Task started: fastANI [2024-01-24 12:55:45,464] [INFO] Running command: fastANI --query /var/lib/cwl/stg8ad93383-e2cf-47d8-bc90-2d70de8d1af8/GCF_900129685.1_IMG-taxon_2617270829_annotated_assembly_genomic.fna.gz --refList GCF_900129685.1_IMG-taxon_2617270829_annotated_assembly_genomic.fna/target_genomes_gtdb.txt --output GCF_900129685.1_IMG-taxon_2617270829_annotated_assembly_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2024-01-24 12:56:00,286] [INFO] Task succeeded: fastANI [2024-01-24 12:56:00,309] [INFO] Found 21 fastANI hits (1 hits with ANI > circumscription radius) [2024-01-24 12:56:00,310] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCF_900129685.1 s__Cognatishimia maritima 100.0 1082 1089 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Cognatishimia 95.0 N/A N/A N/A N/A 1 conclusive GCF_001458335.1 s__Cognatishimia activa 79.7483 438 1089 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Cognatishimia 95.0 99.97 99.97 0.99 0.99 2 - GCF_017798205.1 s__Cognatishimia activa_A 78.7211 333 1089 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Cognatishimia 95.0 N/A N/A N/A N/A 1 - GCA_013215215.1 s__Cognatishimia sp013215215 78.3941 313 1089 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Cognatishimia 95.0 N/A N/A N/A N/A 1 - GCF_000511385.1 s__Phaeobacter gallaeciensis 78.0214 234 1089 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Phaeobacter 95.0 99.42 98.01 0.98 0.96 8 - GCF_900114635.1 s__Shimia aestuarii 77.9953 259 1089 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Shimia 95.0 N/A N/A N/A N/A 1 - GCF_009496005.1 s__Epibacterium litoralis 77.8665 216 1089 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Epibacterium 95.0 N/A N/A N/A N/A 1 - GCF_017743735.1 s__Shimia sp017743735 77.7985 305 1089 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Shimia 95.0 95.87 95.63 0.93 0.93 3 - GCF_017744095.1 s__Shimia sp017744095 77.7094 259 1089 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Shimia 95.0 N/A N/A N/A N/A 1 - GCF_001258055.1 s__Phaeobacter italicus 77.6864 236 1089 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Phaeobacter 95.0 99.00 98.06 0.96 0.92 7 - GCF_900302455.1 s__Ruegeria_E mediterranea 77.5319 209 1089 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Ruegeria_E 95.0 99.84 99.84 0.96 0.96 2 - GCA_900143635.1 s__FREY01 sp900143635 77.529 226 1089 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__FREY01 95.0 N/A N/A N/A N/A 1 - GCF_900102795.1 s__Epibacterium ulvae 77.4793 160 1089 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Epibacterium 95.0 99.99 99.99 0.99 0.99 2 - GCF_900172225.1 s__Ruegeria_E litorea 77.4059 219 1089 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Ruegeria_E 95.0 N/A N/A N/A N/A 1 - GCF_001679925.1 s__Leisingera sp001679925 77.2473 224 1089 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Leisingera 95.0 N/A N/A N/A N/A 1 - GCA_900143615.1 s__Shimia sp900143615 77.2242 202 1089 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Shimia 95.0 N/A N/A N/A N/A 1 - GCF_003014475.1 s__Shimia abyssi 77.1855 220 1089 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Shimia 95.0 N/A N/A N/A N/A 1 - GCF_900142185.1 s__Lutimaribacter pacificus 77.0721 187 1089 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Lutimaribacter 95.0 100.00 100.00 1.00 1.00 2 - GCF_001507595.1 s__Ruegeria marisrubri 77.0292 192 1089 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Ruegeria 95.0 N/A N/A N/A N/A 1 - GCF_003122245.1 s__Roseovarius sp003122245 76.4609 125 1089 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Roseovarius 95.0 N/A N/A N/A N/A 1 - GCA_002440625.1 s__Albidovulum sp002440625 75.8277 59 1089 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Albidovulum 95.0 N/A N/A N/A N/A 1 - -------------------------------------------------------------------------------- [2024-01-24 12:56:00,311] [INFO] GTDB search result was written to GCF_900129685.1_IMG-taxon_2617270829_annotated_assembly_genomic.fna/result_gtdb.tsv [2024-01-24 12:56:00,312] [INFO] ===== GTDB Search completed ===== [2024-01-24 12:56:00,316] [INFO] DFAST_QC result json was written to GCF_900129685.1_IMG-taxon_2617270829_annotated_assembly_genomic.fna/dqc_result.json [2024-01-24 12:56:00,317] [INFO] DFAST_QC completed! [2024-01-24 12:56:00,317] [INFO] Total running time: 0h1m21s