[2023-06-27 00:13:11,326] [INFO] DFAST_QC pipeline started. [2023-06-27 00:13:11,329] [INFO] DFAST_QC version: 0.5.7 [2023-06-27 00:13:11,330] [INFO] DQC Reference Directory: /var/lib/cwl/stg7b82b801-2703-4d08-b76f-0888b63bf32e/dqc_reference [2023-06-27 00:13:13,028] [INFO] ===== Start taxonomy check using ANI ===== [2023-06-27 00:13:13,031] [INFO] Task started: Prodigal [2023-06-27 00:13:13,031] [INFO] Running command: gunzip -c /var/lib/cwl/stg79fa9307-4ab6-490a-bd14-e2d4c59da779/GCA_026419405.1_ASM2641940v1_genomic.fna.gz | prodigal -d GCA_026419405.1_ASM2641940v1_genomic.fna/cds.fna -a GCA_026419405.1_ASM2641940v1_genomic.fna/protein.faa -g 11 -q > /dev/null [2023-06-27 00:13:21,616] [INFO] Task succeeded: Prodigal [2023-06-27 00:13:21,616] [INFO] Task started: HMMsearch [2023-06-27 00:13:21,616] [INFO] Running command: hmmsearch --tblout GCA_026419405.1_ASM2641940v1_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg7b82b801-2703-4d08-b76f-0888b63bf32e/dqc_reference/reference_markers.hmm GCA_026419405.1_ASM2641940v1_genomic.fna/protein.faa > /dev/null [2023-06-27 00:13:21,933] [INFO] Task succeeded: HMMsearch [2023-06-27 00:13:21,935] [WARNING] Found 5/6 markers. [/var/lib/cwl/stg79fa9307-4ab6-490a-bd14-e2d4c59da779/GCA_026419405.1_ASM2641940v1_genomic.fna.gz] [2023-06-27 00:13:22,008] [INFO] Query marker FASTA was written to GCA_026419405.1_ASM2641940v1_genomic.fna/markers.fasta [2023-06-27 00:13:22,008] [INFO] Task started: Blastn [2023-06-27 00:13:22,008] [INFO] Running command: blastn -query GCA_026419405.1_ASM2641940v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg7b82b801-2703-4d08-b76f-0888b63bf32e/dqc_reference/reference_markers.fasta -out GCA_026419405.1_ASM2641940v1_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-27 00:13:23,034] [INFO] Task succeeded: Blastn [2023-06-27 00:13:23,039] [INFO] Selected 27 target genomes. [2023-06-27 00:13:23,040] [INFO] Target genome list was writen to GCA_026419405.1_ASM2641940v1_genomic.fna/target_genomes.txt [2023-06-27 00:13:23,056] [INFO] Task started: fastANI [2023-06-27 00:13:23,056] [INFO] Running command: fastANI --query /var/lib/cwl/stg79fa9307-4ab6-490a-bd14-e2d4c59da779/GCA_026419405.1_ASM2641940v1_genomic.fna.gz --refList GCA_026419405.1_ASM2641940v1_genomic.fna/target_genomes.txt --output GCA_026419405.1_ASM2641940v1_genomic.fna/fastani_result.tsv --threads 1 [2023-06-27 00:13:46,304] [INFO] Task succeeded: fastANI [2023-06-27 00:13:46,304] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg7b82b801-2703-4d08-b76f-0888b63bf32e/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2023-06-27 00:13:46,305] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg7b82b801-2703-4d08-b76f-0888b63bf32e/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2023-06-27 00:13:46,325] [INFO] Found 27 fastANI hits (0 hits with ANI > threshold) [2023-06-27 00:13:46,325] [INFO] The taxonomy check result is classified as 'below_threshold'. [2023-06-27 00:13:46,325] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Limimaricola pyoseonensis strain=DSM 21424 GCA_900102015.1 521013 521013 type True 79.9705 565 1102 95 below_threshold Cereibacter azotoformans strain=KA25 GCA_003050905.1 43057 43057 type True 79.5406 524 1102 95 below_threshold Cereibacter sediminicola strain=JA983 GCA_007668225.1 2584941 2584941 type True 79.3948 531 1102 95 below_threshold Rhodobacter calidifons strain=M37P GCA_011174775.1 2715277 2715277 type True 79.2526 532 1102 95 below_threshold Rhodobacter thermarum strain=YIM 73036 GCA_003574395.1 2670345 2670345 type True 79.1143 482 1102 95 below_threshold Wenxinia marina strain=DSM 24838 GCA_000379485.1 390641 390641 type True 79.1063 582 1102 95 below_threshold Wenxinia marina strain=DSM 24838 GCA_000836695.1 390641 390641 type True 79.0613 579 1102 95 below_threshold Wenxinia marina strain=CGMCC 1.6105 GCA_014645075.1 390641 390641 type True 79.0541 591 1102 95 below_threshold Rhodovulum euryhalinum strain=DSM 4868 GCA_004342445.1 35805 35805 type True 79.0369 511 1102 95 below_threshold Roseivivax isoporae strain=LMG 25204 GCA_000521865.1 591206 591206 type True 79.019 590 1102 95 below_threshold Paracoccus sanguinis strain=DSM 29303 GCA_900106665.1 1545044 1545044 type True 78.9006 511 1102 95 below_threshold Sinirhodobacter huangdaonensis strain=CGMCC 1.12963 GCA_004022465.1 2501515 2501515 type True 78.8538 558 1102 95 below_threshold Rhodovulum robiginosum strain=DSM 12329 GCA_003944755.1 68292 68292 type True 78.7001 530 1102 95 below_threshold Rhodovulum tesquicola strain=A-36s GCA_024128855.1 540254 540254 type True 78.6659 482 1102 95 below_threshold Paracoccus solventivorans strain=DSM 6637 GCA_900142875.1 53463 53463 type True 78.4988 379 1102 95 below_threshold Defluviimonas aquaemixtae strain=CECT 8626 GCA_900302475.1 1542388 1542388 type True 78.4932 448 1102 95 below_threshold Pseudoroseicyclus aestuarii strain=CECT 9025 GCA_003217255.1 1795041 1795041 type True 78.4504 434 1102 95 below_threshold Frigidibacter mobilis strain=cai42 GCA_001620265.1 1335048 1335048 type True 78.4497 516 1102 95 below_threshold Rhodovulum strictum strain=DSM 11289 GCA_009649175.1 58314 58314 type True 78.406 465 1102 95 below_threshold Silicimonas algicola strain=DSM 103371 GCA_003148765.1 1826607 1826607 type True 78.3859 424 1102 95 below_threshold Paracoccus sphaerophysae strain=HAMBI 3106 GCA_000763805.1 690417 690417 type True 78.2713 428 1102 95 below_threshold Brevirhabdus pacifica strain=22DY15 GCA_002094875.1 1267768 1267768 type True 78.1511 350 1102 95 below_threshold Cereibacter ovatus strain=JA234 GCA_900207575.1 439529 439529 type True 78.0966 408 1102 95 below_threshold Palleronia sediminis strain=SS33 GCA_004358695.1 2547833 2547833 type True 78.0159 414 1102 95 below_threshold Brevirhabdus pacifica strain=DSM 27767 GCA_002797755.1 1267768 1267768 type True 77.973 398 1102 95 below_threshold Leisingera aquaemixtae strain=CECT 8399 GCA_001458395.1 1396826 1396826 type True 77.3884 380 1102 95 below_threshold Leisingera daeponensis strain=DSM 23529 GCA_000473145.1 405746 405746 type True 77.3659 376 1102 95 below_threshold -------------------------------------------------------------------------------- [2023-06-27 00:13:46,332] [INFO] DFAST Taxonomy check result was written to GCA_026419405.1_ASM2641940v1_genomic.fna/tc_result.tsv [2023-06-27 00:13:46,333] [INFO] ===== Taxonomy check completed ===== [2023-06-27 00:13:46,333] [INFO] ===== Start completeness check using CheckM ===== [2023-06-27 00:13:46,333] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg7b82b801-2703-4d08-b76f-0888b63bf32e/dqc_reference/checkm_data [2023-06-27 00:13:46,334] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2023-06-27 00:13:46,378] [INFO] Task started: CheckM [2023-06-27 00:13:46,378] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCA_026419405.1_ASM2641940v1_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCA_026419405.1_ASM2641940v1_genomic.fna/checkm_input GCA_026419405.1_ASM2641940v1_genomic.fna/checkm_result [2023-06-27 00:14:28,100] [INFO] Task succeeded: CheckM [2023-06-27 00:14:28,101] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 95.83% Contamintation: 0.00% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2023-06-27 00:14:28,124] [INFO] ===== Completeness check finished ===== [2023-06-27 00:14:28,124] [INFO] ===== Start GTDB Search ===== [2023-06-27 00:14:28,125] [INFO] Query marker FASTA already exists. Will reuse it. (GCA_026419405.1_ASM2641940v1_genomic.fna/markers.fasta) [2023-06-27 00:14:28,125] [INFO] Task started: Blastn [2023-06-27 00:14:28,125] [INFO] Running command: blastn -query GCA_026419405.1_ASM2641940v1_genomic.fna/markers.fasta -db /var/lib/cwl/stg7b82b801-2703-4d08-b76f-0888b63bf32e/dqc_reference/reference_markers_gtdb.fasta -out GCA_026419405.1_ASM2641940v1_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-27 00:14:30,002] [INFO] Task succeeded: Blastn [2023-06-27 00:14:30,009] [INFO] Selected 30 target genomes. [2023-06-27 00:14:30,009] [INFO] Target genome list was writen to GCA_026419405.1_ASM2641940v1_genomic.fna/target_genomes_gtdb.txt [2023-06-27 00:14:30,037] [INFO] Task started: fastANI [2023-06-27 00:14:30,038] [INFO] Running command: fastANI --query /var/lib/cwl/stg79fa9307-4ab6-490a-bd14-e2d4c59da779/GCA_026419405.1_ASM2641940v1_genomic.fna.gz --refList GCA_026419405.1_ASM2641940v1_genomic.fna/target_genomes_gtdb.txt --output GCA_026419405.1_ASM2641940v1_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2023-06-27 00:14:53,425] [INFO] Task succeeded: fastANI [2023-06-27 00:14:53,458] [INFO] Found 30 fastANI hits (0 hits with ANI > circumscription radius) [2023-06-27 00:14:53,459] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCA_011046725.1 s__DSPG01 sp011046725 80.473 587 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__DSPG01 95.0 N/A N/A N/A N/A 1 - GCA_008933605.1 s__Albidovulum sp008933605 79.919 505 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Albidovulum 95.0 N/A N/A N/A N/A 1 - GCF_900102015.1 s__Limimaricola pyoseonensis 79.9046 572 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Limimaricola 95.0 N/A N/A N/A N/A 1 - GCF_012395815.1 s__Roseicyclus sp012395815 79.686 616 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Roseicyclus 95.0 N/A N/A N/A N/A 1 - GCA_011620265.1 s__Albidovulum sp011620265 79.5933 558 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Albidovulum 95.0 N/A N/A N/A N/A 1 - GCF_003298775.1 s__Rhodosalinus sp003298775 79.3548 550 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Rhodosalinus 95.0 N/A N/A N/A N/A 1 - GCA_007131945.1 s__Pararhodobacter sp007131945 79.3313 425 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Pararhodobacter 95.0 99.16 99.15 0.86 0.82 3 - GCA_002280515.1 s__Albidovulum sp002280515 79.2901 520 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Albidovulum 95.0 N/A N/A N/A N/A 1 - GCF_002869745.1 s__Oceaniglobus roseus 79.252 611 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Oceaniglobus 95.0 N/A N/A N/A N/A 1 - GCF_000153305.1 s__Oceanicola granulosus 79.2432 570 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Oceanicola 95.0 N/A N/A N/A N/A 1 - GCA_002841115.1 s__Albidovulum sp002841115 79.1493 540 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Albidovulum 95.0 N/A N/A N/A N/A 1 - GCF_000379485.1 s__Wenxinia marina 79.1058 583 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Wenxinia 95.0 99.99 99.99 0.99 0.99 3 - GCA_015689745.1 s__Roseicyclus sp015689745 79.0555 499 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Roseicyclus 95.0 N/A N/A N/A N/A 1 - GCF_004010155.1 s__Solirhodobacter olei 79.0438 507 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Solirhodobacter 95.0 N/A N/A N/A N/A 1 - GCF_004342445.1 s__Rhodovulum euryhalinum 79.0319 511 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Rhodovulum 95.0 N/A N/A N/A N/A 1 - GCA_018240425.1 s__TMP-24 sp018240425 79.0191 496 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__TMP-24 95.0 99.95 99.95 0.97 0.97 2 - GCA_001314685.1 s__HLUCCA09 sp001314685 79.0099 536 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__HLUCCA09 95.0 N/A N/A N/A N/A 1 - GCF_000442255.1 s__Salipiger mucosus 79.0071 570 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Salipiger 95.0 N/A N/A N/A N/A 1 - GCF_003340565.1 s__HLUCCA09 sp003340565 78.9997 594 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__HLUCCA09 95.0 N/A N/A N/A N/A 1 - GCF_003993775.1 s__Frigidibacter sp003993775 78.8268 551 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Frigidibacter 95.0 N/A N/A N/A N/A 1 - GCF_003944755.1 s__Rhodovulum robiginosum 78.7152 528 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Rhodovulum 95.0 N/A N/A N/A N/A 1 - GCF_001633165.1 s__Rhodovulum sulfidophilum 78.5359 466 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Rhodovulum 95.0 97.78 97.16 0.92 0.88 14 - GCF_900302475.1 s__Albidovulum aquaemixtae 78.4999 445 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Albidovulum 95.0 N/A N/A N/A N/A 1 - GCF_001620265.1 s__Frigidibacter mobilis 78.4499 516 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Frigidibacter 95.0 N/A N/A N/A N/A 1 - GCF_900109035.1 s__Cribrihabitans marinus 78.4225 463 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Cribrihabitans 95.0 100.00 100.00 0.99 0.99 2 - GCF_003789055.1 s__Oceanicola lentulus 78.3788 515 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Oceanicola 95.0 N/A N/A N/A N/A 1 - GCA_003241785.1 s__Amaricoccus sp003241785 78.275 521 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Amaricoccus 95.0 N/A N/A N/A N/A 1 - GCA_002280405.1 s__Tabrizicola sp002280405 78.1562 356 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Tabrizicola 95.0 N/A N/A N/A N/A 1 - GCF_002797755.1 s__Brevirhabdus pacifica 77.9908 396 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Brevirhabdus 95.0 99.99 99.97 0.99 0.98 4 - GCA_003550665.1 s__Rhodobaculum sp003550665 77.9842 362 1102 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Rhodobaculum 95.0 99.32 99.19 0.86 0.85 4 - -------------------------------------------------------------------------------- [2023-06-27 00:14:53,473] [INFO] GTDB search result was written to GCA_026419405.1_ASM2641940v1_genomic.fna/result_gtdb.tsv [2023-06-27 00:14:53,474] [INFO] ===== GTDB Search completed ===== [2023-06-27 00:14:53,488] [INFO] DFAST_QC result json was written to GCA_026419405.1_ASM2641940v1_genomic.fna/dqc_result.json [2023-06-27 00:14:53,488] [INFO] DFAST_QC completed! [2023-06-27 00:14:53,488] [INFO] Total running time: 0h1m42s