[2023-06-30 23:29:01,150] [INFO] DFAST_QC pipeline started. [2023-06-30 23:29:01,153] [INFO] DFAST_QC version: 0.5.7 [2023-06-30 23:29:01,153] [INFO] DQC Reference Directory: /var/lib/cwl/stgf529c1a2-6f1d-4365-b073-3a8f2b1bcd40/dqc_reference [2023-06-30 23:29:04,038] [INFO] ===== Start taxonomy check using ANI ===== [2023-06-30 23:29:04,039] [INFO] Task started: Prodigal [2023-06-30 23:29:04,040] [INFO] Running command: gunzip -c /var/lib/cwl/stge5fbb83f-7c44-42f1-a316-30cedec1380b/GCA_024643855.1_ASM2464385v1_genomic.fna.gz | prodigal -d GCA_024643855.1_ASM2464385v1_genomic.fna/cds.fna -a GCA_024643855.1_ASM2464385v1_genomic.fna/protein.faa -g 11 -q > /dev/null [2023-06-30 23:29:11,685] [INFO] Task succeeded: Prodigal [2023-06-30 23:29:11,685] [INFO] Task started: HMMsearch [2023-06-30 23:29:11,686] [INFO] Running command: hmmsearch --tblout GCA_024643855.1_ASM2464385v1_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stgf529c1a2-6f1d-4365-b073-3a8f2b1bcd40/dqc_reference/reference_markers.hmm GCA_024643855.1_ASM2464385v1_genomic.fna/protein.faa > /dev/null [2023-06-30 23:29:11,944] [INFO] Task succeeded: HMMsearch [2023-06-30 23:29:11,946] [INFO] Found 6/6 markers. [2023-06-30 23:29:11,980] [INFO] Query marker FASTA was written to GCA_024643855.1_ASM2464385v1_genomic.fna/markers.fasta [2023-06-30 23:29:11,981] [INFO] Task started: Blastn [2023-06-30 23:29:11,981] [INFO] Running command: blastn -query GCA_024643855.1_ASM2464385v1_genomic.fna/markers.fasta -db /var/lib/cwl/stgf529c1a2-6f1d-4365-b073-3a8f2b1bcd40/dqc_reference/reference_markers.fasta -out GCA_024643855.1_ASM2464385v1_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-30 23:29:12,596] [INFO] Task succeeded: Blastn [2023-06-30 23:29:12,600] [INFO] Selected 34 target genomes. [2023-06-30 23:29:12,601] [INFO] Target genome list was writen to GCA_024643855.1_ASM2464385v1_genomic.fna/target_genomes.txt [2023-06-30 23:29:12,608] [INFO] Task started: fastANI [2023-06-30 23:29:12,608] [INFO] Running command: fastANI --query /var/lib/cwl/stge5fbb83f-7c44-42f1-a316-30cedec1380b/GCA_024643855.1_ASM2464385v1_genomic.fna.gz --refList GCA_024643855.1_ASM2464385v1_genomic.fna/target_genomes.txt --output GCA_024643855.1_ASM2464385v1_genomic.fna/fastani_result.tsv --threads 1 [2023-06-30 23:29:32,553] [INFO] Task succeeded: fastANI [2023-06-30 23:29:32,554] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stgf529c1a2-6f1d-4365-b073-3a8f2b1bcd40/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2023-06-30 23:29:32,555] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stgf529c1a2-6f1d-4365-b073-3a8f2b1bcd40/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2023-06-30 23:29:32,583] [INFO] Found 23 fastANI hits (1 hits with ANI > threshold) [2023-06-30 23:29:32,583] [INFO] The taxonomy check result is classified as 'conclusive'. [2023-06-30 23:29:32,584] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Planktomarina temperata strain=RCA23 GCA_000738435.1 1284658 1284658 type True 98.9213 722 738 95 conclusive Nereida ignava strain=CECT 5292 GCA_001049735.1 282199 282199 type True 76.3358 52 738 95 below_threshold Nereida ignava strain=DSM 16309 GCA_900114125.1 282199 282199 type True 76.2897 50 738 95 below_threshold Actibacterium atlanticum strain=22II-S11-z10 GCA_000671395.1 1461693 1461693 type True 76.1252 54 738 95 below_threshold Pseudophaeobacter flagellatus strain=MA21411-1 GCA_021228235.1 2899119 2899119 type True 76.1201 80 738 95 below_threshold Phaeobacter porticola strain=P97 GCA_001888185.1 1844006 1844006 type True 76.0842 70 738 95 below_threshold Epibacterium ulvae strain=U95 GCA_002796795.1 1156985 1156985 type True 75.9771 60 738 95 below_threshold Epibacterium ulvae strain=U95 GCA_900102795.1 1156985 1156985 type True 75.9606 61 738 95 below_threshold Pelagicola litoralis strain=CL-ES2 GCA_005518135.1 420403 420403 type True 75.9556 58 738 95 below_threshold Puniceibacterium antarcticum strain=SM1211 GCA_002760615.1 1206336 1206336 type True 75.9091 54 738 95 below_threshold Roseovarius marisflavi strain=DSM 29327 GCA_900142625.1 1054996 1054996 type True 75.8515 73 738 95 below_threshold Sulfitobacter brevis strain=DSM 11443 GCA_900112755.1 74348 74348 type True 75.8495 58 738 95 below_threshold Tritonibacter multivorans strain=CECT 7557 GCA_001458415.1 928856 928856 type True 75.8308 82 738 95 below_threshold Leisingera methylohalidivorans strain=DSM 14336; MB2 GCA_000511355.1 133924 133924 type True 75.8279 59 738 95 below_threshold Pelagivirga sediminicola strain=BH-SD19 GCA_003072125.1 2170575 2170575 type True 75.8172 55 738 95 below_threshold Tritonibacter litoralis strain=SM1979 GCA_009496005.1 2662264 2662264 type True 75.8008 69 738 95 below_threshold Aestuariivita boseongensis strain=BS-B2 GCA_001262635.1 1470562 1470562 type True 75.6601 76 738 95 below_threshold Pseudoprimorskyibacter insulae strain=CECT 8871 GCA_900302505.1 1695997 1695997 type True 75.6265 59 738 95 below_threshold Rhabdonatronobacter sediminivivens strain=IM2376 GCA_013415485.1 2743469 2743469 type True 75.5816 51 738 95 below_threshold Roseovarius gaetbuli strain=CECT 8370 GCA_900172365.1 1356575 1356575 type True 75.5479 83 738 95 below_threshold Salipiger marinus strain=DSM 26424 GCA_900100085.1 555512 555512 type True 75.5039 61 738 95 below_threshold Pseudorhodobacter turbinis strain=S12M18 GCA_005234135.1 2500533 2500533 type True 75.4789 55 738 95 below_threshold Roseovarius gahaiensis strain=GH877 GCA_011601345.1 2716691 2716691 type True 75.4756 63 738 95 below_threshold -------------------------------------------------------------------------------- [2023-06-30 23:29:32,587] [INFO] DFAST Taxonomy check result was written to GCA_024643855.1_ASM2464385v1_genomic.fna/tc_result.tsv [2023-06-30 23:29:32,588] [INFO] ===== Taxonomy check completed ===== [2023-06-30 23:29:32,588] [INFO] ===== Start completeness check using CheckM ===== [2023-06-30 23:29:32,588] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stgf529c1a2-6f1d-4365-b073-3a8f2b1bcd40/dqc_reference/checkm_data [2023-06-30 23:29:32,590] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2023-06-30 23:29:32,619] [INFO] Task started: CheckM [2023-06-30 23:29:32,619] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCA_024643855.1_ASM2464385v1_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCA_024643855.1_ASM2464385v1_genomic.fna/checkm_input GCA_024643855.1_ASM2464385v1_genomic.fna/checkm_result [2023-06-30 23:30:00,883] [INFO] Task succeeded: CheckM [2023-06-30 23:30:00,884] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 100.00% Contamintation: 0.00% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2023-06-30 23:30:00,907] [INFO] ===== Completeness check finished ===== [2023-06-30 23:30:00,908] [INFO] ===== Start GTDB Search ===== [2023-06-30 23:30:00,908] [INFO] Query marker FASTA already exists. Will reuse it. (GCA_024643855.1_ASM2464385v1_genomic.fna/markers.fasta) [2023-06-30 23:30:00,909] [INFO] Task started: Blastn [2023-06-30 23:30:00,909] [INFO] Running command: blastn -query GCA_024643855.1_ASM2464385v1_genomic.fna/markers.fasta -db /var/lib/cwl/stgf529c1a2-6f1d-4365-b073-3a8f2b1bcd40/dqc_reference/reference_markers_gtdb.fasta -out GCA_024643855.1_ASM2464385v1_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-30 23:30:01,779] [INFO] Task succeeded: Blastn [2023-06-30 23:30:01,783] [INFO] Selected 9 target genomes. [2023-06-30 23:30:01,784] [INFO] Target genome list was writen to GCA_024643855.1_ASM2464385v1_genomic.fna/target_genomes_gtdb.txt [2023-06-30 23:30:01,791] [INFO] Task started: fastANI [2023-06-30 23:30:01,791] [INFO] Running command: fastANI --query /var/lib/cwl/stge5fbb83f-7c44-42f1-a316-30cedec1380b/GCA_024643855.1_ASM2464385v1_genomic.fna.gz --refList GCA_024643855.1_ASM2464385v1_genomic.fna/target_genomes_gtdb.txt --output GCA_024643855.1_ASM2464385v1_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2023-06-30 23:30:06,922] [INFO] Task succeeded: fastANI [2023-06-30 23:30:06,933] [INFO] Found 8 fastANI hits (1 hits with ANI > circumscription radius) [2023-06-30 23:30:06,933] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCF_000738435.1 s__Planktomarina temperata 98.9213 722 738 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Planktomarina 95.0 98.99 97.76 0.97 0.80 20 conclusive GCA_016780825.1 s__Planktomarina sp016780825 90.5147 634 738 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Planktomarina 95.0 N/A N/A N/A N/A 1 - GCA_905181815.1 s__Planktomarina sp905181815 89.9886 647 738 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Planktomarina 95.0 N/A N/A N/A N/A 1 - GCA_002683685.1 s__Planktomarina sp002683685 87.1291 497 738 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Planktomarina 95.0 95.59 95.45 0.70 0.67 6 - GCA_905182455.1 s__Planktomarina sp905182455 78.006 301 738 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Planktomarina 95.0 N/A N/A N/A N/A 1 - GCA_000981705.1 s__Planktomarina sp000981705 77.5716 173 738 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Planktomarina 95.0 98.80 98.80 0.86 0.86 2 - GCF_001458175.1 s__Shimia marina 75.8677 68 738 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Shimia 95.0 98.13 96.28 0.94 0.89 3 - GCF_011601345.1 s__Roseovarius gahaiensis 75.4421 65 738 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Roseovarius 95.0 N/A N/A N/A N/A 1 - -------------------------------------------------------------------------------- [2023-06-30 23:30:06,935] [INFO] GTDB search result was written to GCA_024643855.1_ASM2464385v1_genomic.fna/result_gtdb.tsv [2023-06-30 23:30:06,936] [INFO] ===== GTDB Search completed ===== [2023-06-30 23:30:06,940] [INFO] DFAST_QC result json was written to GCA_024643855.1_ASM2464385v1_genomic.fna/dqc_result.json [2023-06-30 23:30:06,940] [INFO] DFAST_QC completed! [2023-06-30 23:30:06,940] [INFO] Total running time: 0h1m6s