[2024-01-24 11:36:20,753] [INFO] DFAST_QC pipeline started. [2024-01-24 11:36:20,755] [INFO] DFAST_QC version: 0.5.7 [2024-01-24 11:36:20,755] [INFO] DQC Reference Directory: /var/lib/cwl/stge6099c5e-660f-4e67-9450-6d86979de722/dqc_reference [2024-01-24 11:36:21,990] [INFO] ===== Start taxonomy check using ANI ===== [2024-01-24 11:36:21,991] [INFO] Task started: Prodigal [2024-01-24 11:36:21,991] [INFO] Running command: gunzip -c /var/lib/cwl/stg6c55a359-ddbf-4018-9ec9-a0a225f279a1/GCF_016132445.1_ASM1613244v1_genomic.fna.gz | prodigal -d GCF_016132445.1_ASM1613244v1_genomic.fna/cds.fna -a GCF_016132445.1_ASM1613244v1_genomic.fna/protein.faa -g 11 -q > /dev/null [2024-01-24 11:36:30,236] [INFO] Task succeeded: Prodigal [2024-01-24 11:36:30,237] [INFO] Task started: HMMsearch [2024-01-24 11:36:30,237] [INFO] Running command: hmmsearch --tblout GCF_016132445.1_ASM1613244v1_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stge6099c5e-660f-4e67-9450-6d86979de722/dqc_reference/reference_markers.hmm GCF_016132445.1_ASM1613244v1_genomic.fna/protein.faa > /dev/null [2024-01-24 11:36:30,476] [INFO] Task succeeded: HMMsearch [2024-01-24 11:36:30,478] [INFO] Found 6/6 markers. [2024-01-24 11:36:30,508] [INFO] Query marker FASTA was written to GCF_016132445.1_ASM1613244v1_genomic.fna/markers.fasta [2024-01-24 11:36:30,508] [INFO] Task started: Blastn [2024-01-24 11:36:30,509] [INFO] Running command: blastn -query GCF_016132445.1_ASM1613244v1_genomic.fna/markers.fasta -db /var/lib/cwl/stge6099c5e-660f-4e67-9450-6d86979de722/dqc_reference/reference_markers.fasta -out GCF_016132445.1_ASM1613244v1_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-24 11:36:31,349] [INFO] Task succeeded: Blastn [2024-01-24 11:36:31,353] [INFO] Selected 22 target genomes. [2024-01-24 11:36:31,353] [INFO] Target genome list was writen to GCF_016132445.1_ASM1613244v1_genomic.fna/target_genomes.txt [2024-01-24 11:36:31,426] [INFO] Task started: fastANI [2024-01-24 11:36:31,426] [INFO] Running command: fastANI --query /var/lib/cwl/stg6c55a359-ddbf-4018-9ec9-a0a225f279a1/GCF_016132445.1_ASM1613244v1_genomic.fna.gz --refList GCF_016132445.1_ASM1613244v1_genomic.fna/target_genomes.txt --output GCF_016132445.1_ASM1613244v1_genomic.fna/fastani_result.tsv --threads 1 [2024-01-24 11:36:46,196] [INFO] Task succeeded: fastANI [2024-01-24 11:36:46,197] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stge6099c5e-660f-4e67-9450-6d86979de722/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2024-01-24 11:36:46,198] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stge6099c5e-660f-4e67-9450-6d86979de722/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2024-01-24 11:36:46,218] [INFO] Found 22 fastANI hits (0 hits with ANI > threshold) [2024-01-24 11:36:46,218] [INFO] The taxonomy check result is classified as 'below_threshold'. [2024-01-24 11:36:46,218] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Thalassolituus marinus strain=IMCC1826 GCA_020320395.1 671053 671053 type True 79.4174 391 942 95 below_threshold Oleibacter marinus strain=DSM 24913 GCA_900156675.1 484498 484498 type True 78.3052 171 942 95 below_threshold Oceanobacter mangrovi strain=SM2-42 GCA_019740315.1 2862510 2862510 type True 78.1106 229 942 95 below_threshold Oceanobacter kriegii strain=DSM 6294 GCA_000422845.1 64972 64972 type True 77.7734 262 942 95 below_threshold Bacterioplanes sanyensis strain=KCTC 32220 GCA_014652515.1 1249553 1249553 type True 77.5351 223 942 95 below_threshold Microbulbifer celer strain=KCTC 12973 GCA_020991125.1 435905 435905 type True 77.3888 81 942 95 below_threshold Pseudomonas spelaei strain=CCM 7893 GCA_009724245.1 1055469 1055469 type True 77.286 79 942 95 below_threshold Pseudomonas sihuiensis strain=KCTC 32246 GCA_900106015.1 1274359 1274359 type True 77.2358 105 942 95 below_threshold Pseudomonas eucalypticola strain=NP-1 GCA_013374995.1 2599595 2599595 type True 77.2232 97 942 95 below_threshold Pseudomonas cavernae strain=K2W31S-8 GCA_003595175.1 2320867 2320867 type True 77.0546 108 942 95 below_threshold Pseudomonas lalucatii strain=R1b54 GCA_018398425.1 1424203 1424203 type True 76.9168 98 942 95 below_threshold Pseudomonas songnenensis strain=DSM 27560T GCA_024448495.1 1176259 1176259 type True 76.8465 87 942 95 below_threshold Pseudomonas songnenensis strain=NEAU-ST5-5 GCA_003696315.1 1176259 1176259 type True 76.7945 88 942 95 below_threshold Halioglobus japonicus strain=NBRC 107739 GCA_001983995.1 930805 930805 type True 76.7516 57 942 95 below_threshold Marinobacter pelagius strain=CGMCC 1.6775 GCA_900114925.1 379482 379482 type True 76.5834 89 942 95 below_threshold Pseudomonas flexibilis strain=ATCC 29606 GCA_000802425.1 706570 706570 type True 76.4389 103 942 95 below_threshold Pseudomonas salomonii strain=ICMP 14252 GCA_900107155.1 191391 191391 type True 76.3604 95 942 95 below_threshold Pseudomonas flexibilis strain=ATCC 29606 GCA_900155995.1 706570 706570 type True 76.2781 103 942 95 below_threshold Marinobacterium georgiense strain=DSM 11526 GCA_900107855.1 48076 48076 type True 76.2576 75 942 95 below_threshold Marinobacter xestospongiae strain=JCM 17469 GCA_023156385.1 994319 994319 type True 76.2312 91 942 95 below_threshold Halioglobus japonicus strain=S1-36 GCA_002869505.1 930805 930805 type True 76.2126 55 942 95 below_threshold Alcanivorax marinus strain=R8-12 GCA_025532125.1 1177169 1177169 type True 75.934 78 942 95 below_threshold -------------------------------------------------------------------------------- [2024-01-24 11:36:46,228] [INFO] DFAST Taxonomy check result was written to GCF_016132445.1_ASM1613244v1_genomic.fna/tc_result.tsv [2024-01-24 11:36:46,228] [INFO] ===== Taxonomy check completed ===== [2024-01-24 11:36:46,229] [INFO] ===== Start completeness check using CheckM ===== [2024-01-24 11:36:46,229] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stge6099c5e-660f-4e67-9450-6d86979de722/dqc_reference/checkm_data [2024-01-24 11:36:46,231] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2024-01-24 11:36:46,265] [INFO] Task started: CheckM [2024-01-24 11:36:46,265] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCF_016132445.1_ASM1613244v1_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCF_016132445.1_ASM1613244v1_genomic.fna/checkm_input GCF_016132445.1_ASM1613244v1_genomic.fna/checkm_result [2024-01-24 11:37:15,486] [INFO] Task succeeded: CheckM [2024-01-24 11:37:15,488] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 100.00% Contamintation: 0.00% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2024-01-24 11:37:15,508] [INFO] ===== Completeness check finished ===== [2024-01-24 11:37:15,509] [INFO] ===== Start GTDB Search ===== [2024-01-24 11:37:15,509] [INFO] Query marker FASTA already exists. Will reuse it. (GCF_016132445.1_ASM1613244v1_genomic.fna/markers.fasta) [2024-01-24 11:37:15,509] [INFO] Task started: Blastn [2024-01-24 11:37:15,510] [INFO] Running command: blastn -query GCF_016132445.1_ASM1613244v1_genomic.fna/markers.fasta -db /var/lib/cwl/stge6099c5e-660f-4e67-9450-6d86979de722/dqc_reference/reference_markers_gtdb.fasta -out GCF_016132445.1_ASM1613244v1_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2024-01-24 11:37:16,900] [INFO] Task succeeded: Blastn [2024-01-24 11:37:16,905] [INFO] Selected 9 target genomes. [2024-01-24 11:37:16,905] [INFO] Target genome list was writen to GCF_016132445.1_ASM1613244v1_genomic.fna/target_genomes_gtdb.txt [2024-01-24 11:37:16,915] [INFO] Task started: fastANI [2024-01-24 11:37:16,915] [INFO] Running command: fastANI --query /var/lib/cwl/stg6c55a359-ddbf-4018-9ec9-a0a225f279a1/GCF_016132445.1_ASM1613244v1_genomic.fna.gz --refList GCF_016132445.1_ASM1613244v1_genomic.fna/target_genomes_gtdb.txt --output GCF_016132445.1_ASM1613244v1_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2024-01-24 11:37:24,720] [INFO] Task succeeded: fastANI [2024-01-24 11:37:24,731] [INFO] Found 9 fastANI hits (1 hits with ANI > circumscription radius) [2024-01-24 11:37:24,731] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCA_016132445.1 s__UBA2009 sp016132445 100.0 941 942 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__DSM-6294;g__UBA2009 95.0 97.98 97.98 0.94 0.94 2 conclusive GCF_007785795.1 s__UBA2009 sp002335285 80.0292 558 942 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__DSM-6294;g__UBA2009 95.0 97.54 97.44 0.92 0.92 6 - GCA_002733205.1 s__UBA2009 sp002733205 79.6781 478 942 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__DSM-6294;g__UBA2009 95.0 96.83 96.83 0.84 0.84 2 - GCA_002314145.1 s__UBA2009 sp002314145 78.732 362 942 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__DSM-6294;g__UBA2009 95.0 99.98 99.96 0.95 0.88 5 - GCA_002706025.1 s__UBA2009 sp002706025 78.7275 360 942 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__DSM-6294;g__UBA2009 95.0 98.78 97.86 0.91 0.85 7 - GCF_900156675.1 s__Oleibacter marinus 78.3278 170 942 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__DSM-6294;g__Oleibacter 95.0 98.96 98.96 0.99 0.99 2 - GCA_002724925.1 s__Oleibacter sp002724925 78.158 174 942 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__DSM-6294;g__Oleibacter 95.0 98.64 97.85 0.94 0.89 5 - GCF_000422845.1 s__Oceanobacter kriegii 77.8463 256 942 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__DSM-6294;g__Oceanobacter 95.0 N/A N/A N/A N/A 1 - GCA_006212115.1 s__Oleibacter sp006212115 76.8757 137 942 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__DSM-6294;g__Oleibacter 95.0 N/A N/A N/A N/A 1 - -------------------------------------------------------------------------------- [2024-01-24 11:37:24,733] [INFO] GTDB search result was written to GCF_016132445.1_ASM1613244v1_genomic.fna/result_gtdb.tsv [2024-01-24 11:37:24,733] [INFO] ===== GTDB Search completed ===== [2024-01-24 11:37:24,737] [INFO] DFAST_QC result json was written to GCF_016132445.1_ASM1613244v1_genomic.fna/dqc_result.json [2024-01-24 11:37:24,737] [INFO] DFAST_QC completed! [2024-01-24 11:37:24,737] [INFO] Total running time: 0h1m4s