[2023-06-30 05:03:47,259] [INFO] DFAST_QC pipeline started. [2023-06-30 05:03:47,262] [INFO] DFAST_QC version: 0.5.7 [2023-06-30 05:03:47,262] [INFO] DQC Reference Directory: /var/lib/cwl/stg31433a8a-e485-4a8a-9ec8-d0bfea43b1f2/dqc_reference [2023-06-30 05:03:48,662] [INFO] ===== Start taxonomy check using ANI ===== [2023-06-30 05:03:48,663] [INFO] Task started: Prodigal [2023-06-30 05:03:48,663] [INFO] Running command: gunzip -c /var/lib/cwl/stg5be51eee-4442-42ea-bc77-3502bf477ad2/GCA_024685845.1_ASM2468584v1_genomic.fna.gz | prodigal -d GCA_024685845.1_ASM2468584v1_genomic.fna/cds.fna -a GCA_024685845.1_ASM2468584v1_genomic.fna/protein.faa -g 11 -q > /dev/null [2023-06-30 05:03:55,198] [INFO] Task succeeded: Prodigal [2023-06-30 05:03:55,198] [INFO] Task started: HMMsearch [2023-06-30 05:03:55,198] [INFO] Running command: hmmsearch --tblout GCA_024685845.1_ASM2468584v1_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg31433a8a-e485-4a8a-9ec8-d0bfea43b1f2/dqc_reference/reference_markers.hmm GCA_024685845.1_ASM2468584v1_genomic.fna/protein.faa > /dev/null [2023-06-30 05:03:55,441] [INFO] Task succeeded: HMMsearch [2023-06-30 05:03:55,443] [WARNING] Found 5/6 markers. [/var/lib/cwl/stg5be51eee-4442-42ea-bc77-3502bf477ad2/GCA_024685845.1_ASM2468584v1_genomic.fna.gz] [2023-06-30 05:03:55,477] [INFO] Query marker FASTA was written to GCA_024685845.1_ASM2468584v1_genomic.fna/markers.fasta [2023-06-30 05:03:55,477] [INFO] Task started: Blastn [2023-06-30 05:03:55,477] [INFO] Running command: blastn -query GCA_024685845.1_ASM2468584v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg31433a8a-e485-4a8a-9ec8-d0bfea43b1f2/dqc_reference/reference_markers.fasta -out GCA_024685845.1_ASM2468584v1_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-30 05:03:56,124] [INFO] Task succeeded: Blastn [2023-06-30 05:03:56,128] [INFO] Selected 26 target genomes. [2023-06-30 05:03:56,128] [INFO] Target genome list was writen to GCA_024685845.1_ASM2468584v1_genomic.fna/target_genomes.txt [2023-06-30 05:03:56,130] [INFO] Task started: fastANI [2023-06-30 05:03:56,130] [INFO] Running command: fastANI --query /var/lib/cwl/stg5be51eee-4442-42ea-bc77-3502bf477ad2/GCA_024685845.1_ASM2468584v1_genomic.fna.gz --refList GCA_024685845.1_ASM2468584v1_genomic.fna/target_genomes.txt --output GCA_024685845.1_ASM2468584v1_genomic.fna/fastani_result.tsv --threads 1 [2023-06-30 05:04:11,253] [INFO] Task succeeded: fastANI [2023-06-30 05:04:11,253] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg31433a8a-e485-4a8a-9ec8-d0bfea43b1f2/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2023-06-30 05:04:11,254] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg31433a8a-e485-4a8a-9ec8-d0bfea43b1f2/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2023-06-30 05:04:11,268] [INFO] Found 17 fastANI hits (0 hits with ANI > threshold) [2023-06-30 05:04:11,268] [INFO] The taxonomy check result is classified as 'below_threshold'. [2023-06-30 05:04:11,269] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Ascidiaceihabitans donghaensis strain=CECT 8599 GCA_900302465.1 1510460 1510460 type True 77.6736 145 667 95 below_threshold Sulfitobacter marinus strain=DSM 23422 GCA_900116285.1 394264 394264 type True 76.9304 87 667 95 below_threshold Sulfitobacter undariae strain=DSM 102234 GCA_014196805.1 1563671 1563671 type True 76.9034 94 667 95 below_threshold Sulfitobacter donghicola strain=KCTC 12864 GCA_000622405.1 421000 421000 type True 76.7242 106 667 95 below_threshold Sulfitobacter donghicola strain=JCM 14565 GCA_000712275.1 421000 421000 type True 76.6931 106 667 95 below_threshold Pseudosulfitobacter pseudonitzschiae strain=H3 GCA_000712315.1 1402135 1402135 type True 76.3576 87 667 95 below_threshold Pseudosulfitobacter pseudonitzschiae strain=DSM 26824 GCA_900129395.1 1402135 1402135 type True 76.3153 89 667 95 below_threshold Shimia thalassica strain=CECT 7735 GCA_001458215.1 1715693 1715693 type True 76.2849 68 667 95 below_threshold Shimia marina strain=CECT 7688 GCA_001458175.1 321267 321267 type True 76.2262 54 667 95 below_threshold Roseobacter litoralis strain=Och 149 GCA_000154785.2 42443 42443 type True 76.2159 63 667 95 below_threshold Zongyanglinia huanghaiensis strain=CY05 GCA_009753675.1 2682100 2682100 type True 76.1585 62 667 95 below_threshold Epibacterium ulvae strain=U95 GCA_002796795.1 1156985 1156985 type True 76.0635 56 667 95 below_threshold Epibacterium ulvae strain=U95 GCA_900102795.1 1156985 1156985 type True 76.0188 56 667 95 below_threshold Tritonibacter multivorans strain=CECT 7557 GCA_001458415.1 928856 928856 type True 75.9807 57 667 95 below_threshold Tritonibacter multivorans strain=DSM 26470 GCA_900112515.1 928856 928856 type True 75.9807 57 667 95 below_threshold Marivivens niveibacter strain=MCCC 1A06712 GCA_002150005.2 1930667 1930667 type True 75.9443 55 667 95 below_threshold Actibacterium pelagium strain=JN33 GCA_002285415.1 2029103 2029103 type True 75.6609 50 667 95 below_threshold -------------------------------------------------------------------------------- [2023-06-30 05:04:11,271] [INFO] DFAST Taxonomy check result was written to GCA_024685845.1_ASM2468584v1_genomic.fna/tc_result.tsv [2023-06-30 05:04:11,271] [INFO] ===== Taxonomy check completed ===== [2023-06-30 05:04:11,271] [INFO] ===== Start completeness check using CheckM ===== [2023-06-30 05:04:11,272] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg31433a8a-e485-4a8a-9ec8-d0bfea43b1f2/dqc_reference/checkm_data [2023-06-30 05:04:11,273] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2023-06-30 05:04:11,303] [INFO] Task started: CheckM [2023-06-30 05:04:11,303] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCA_024685845.1_ASM2468584v1_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCA_024685845.1_ASM2468584v1_genomic.fna/checkm_input GCA_024685845.1_ASM2468584v1_genomic.fna/checkm_result [2023-06-30 05:04:36,833] [INFO] Task succeeded: CheckM [2023-06-30 05:04:36,835] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 69.74% Contamintation: 4.17% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2023-06-30 05:04:36,861] [INFO] ===== Completeness check finished ===== [2023-06-30 05:04:36,862] [INFO] ===== Start GTDB Search ===== [2023-06-30 05:04:36,862] [INFO] Query marker FASTA already exists. Will reuse it. (GCA_024685845.1_ASM2468584v1_genomic.fna/markers.fasta) [2023-06-30 05:04:36,863] [INFO] Task started: Blastn [2023-06-30 05:04:36,863] [INFO] Running command: blastn -query GCA_024685845.1_ASM2468584v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg31433a8a-e485-4a8a-9ec8-d0bfea43b1f2/dqc_reference/reference_markers_gtdb.fasta -out GCA_024685845.1_ASM2468584v1_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-30 05:04:37,790] [INFO] Task succeeded: Blastn [2023-06-30 05:04:37,794] [INFO] Selected 14 target genomes. [2023-06-30 05:04:37,795] [INFO] Target genome list was writen to GCA_024685845.1_ASM2468584v1_genomic.fna/target_genomes_gtdb.txt [2023-06-30 05:04:37,800] [INFO] Task started: fastANI [2023-06-30 05:04:37,800] [INFO] Running command: fastANI --query /var/lib/cwl/stg5be51eee-4442-42ea-bc77-3502bf477ad2/GCA_024685845.1_ASM2468584v1_genomic.fna.gz --refList GCA_024685845.1_ASM2468584v1_genomic.fna/target_genomes_gtdb.txt --output GCA_024685845.1_ASM2468584v1_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2023-06-30 05:04:45,900] [INFO] Task succeeded: fastANI [2023-06-30 05:04:45,919] [INFO] Found 10 fastANI hits (0 hits with ANI > circumscription radius) [2023-06-30 05:04:45,919] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCA_002478745.1 s__Ascidiaceihabitans sp002478745 94.9971 519 667 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Ascidiaceihabitans 95.0 99.75 99.75 0.89 0.89 2 - GCA_018647275.1 s__Ascidiaceihabitans sp018647275 92.6248 305 667 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Ascidiaceihabitans 95.0 N/A N/A N/A N/A 1 - GCA_905182105.1 s__Ascidiaceihabitans sp905182105 90.2996 463 667 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Ascidiaceihabitans 95.0 N/A N/A N/A N/A 1 - GCA_905182355.1 s__Ascidiaceihabitans sp905182355 79.2087 259 667 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Ascidiaceihabitans 95.0 N/A N/A N/A N/A 1 - GCF_900302465.1 s__Ascidiaceihabitans donghaensis 77.6736 145 667 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Ascidiaceihabitans 95.0 N/A N/A N/A N/A 1 - GCA_013139555.1 s__Sulfitobacter sp013139555 76.8285 87 667 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Sulfitobacter 95.0 N/A N/A N/A N/A 1 - GCF_000622405.1 s__Sulfitobacter donghicola 76.7242 106 667 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Sulfitobacter 95.0 99.99 99.99 1.00 1.00 2 - GCF_900129395.1 s__Ascidiaceihabitans pseudonitzschiae 76.3153 89 667 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Ascidiaceihabitans 95.0 99.99 99.97 0.98 0.96 4 - GCF_900102795.1 s__Epibacterium ulvae 76.0188 56 667 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Epibacterium 95.0 99.99 99.99 0.99 0.99 2 - GCF_001458415.1 s__Epibacterium multivorans 75.9807 57 667 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Epibacterium 95.0 99.99 99.99 0.99 0.99 2 - -------------------------------------------------------------------------------- [2023-06-30 05:04:45,922] [INFO] GTDB search result was written to GCA_024685845.1_ASM2468584v1_genomic.fna/result_gtdb.tsv [2023-06-30 05:04:45,922] [INFO] ===== GTDB Search completed ===== [2023-06-30 05:04:45,927] [INFO] DFAST_QC result json was written to GCA_024685845.1_ASM2468584v1_genomic.fna/dqc_result.json [2023-06-30 05:04:45,927] [INFO] DFAST_QC completed! [2023-06-30 05:04:45,927] [INFO] Total running time: 0h0m59s