[2023-06-30 19:32:56,865] [INFO] DFAST_QC pipeline started. [2023-06-30 19:32:56,868] [INFO] DFAST_QC version: 0.5.7 [2023-06-30 19:32:56,868] [INFO] DQC Reference Directory: /var/lib/cwl/stg9c7d362d-006c-4528-a092-4fd6eeb5397c/dqc_reference [2023-06-30 19:32:58,500] [INFO] ===== Start taxonomy check using ANI ===== [2023-06-30 19:32:58,501] [INFO] Task started: Prodigal [2023-06-30 19:32:58,502] [INFO] Running command: gunzip -c /var/lib/cwl/stg31790951-2451-481c-838b-b84a552c2020/GCA_903953595.1_freshwater_MAG_---_Loc090519-4m_bin-561_genomic.fna.gz | prodigal -d GCA_903953595.1_freshwater_MAG_---_Loc090519-4m_bin-561_genomic.fna/cds.fna -a GCA_903953595.1_freshwater_MAG_---_Loc090519-4m_bin-561_genomic.fna/protein.faa -g 11 -q > /dev/null [2023-06-30 19:33:05,735] [INFO] Task succeeded: Prodigal [2023-06-30 19:33:05,736] [INFO] Task started: HMMsearch [2023-06-30 19:33:05,736] [INFO] Running command: hmmsearch --tblout GCA_903953595.1_freshwater_MAG_---_Loc090519-4m_bin-561_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg9c7d362d-006c-4528-a092-4fd6eeb5397c/dqc_reference/reference_markers.hmm GCA_903953595.1_freshwater_MAG_---_Loc090519-4m_bin-561_genomic.fna/protein.faa > /dev/null [2023-06-30 19:33:06,007] [INFO] Task succeeded: HMMsearch [2023-06-30 19:33:06,008] [INFO] Found 6/6 markers. [2023-06-30 19:33:06,038] [INFO] Query marker FASTA was written to GCA_903953595.1_freshwater_MAG_---_Loc090519-4m_bin-561_genomic.fna/markers.fasta [2023-06-30 19:33:06,038] [INFO] Task started: Blastn [2023-06-30 19:33:06,039] [INFO] Running command: blastn -query GCA_903953595.1_freshwater_MAG_---_Loc090519-4m_bin-561_genomic.fna/markers.fasta -db /var/lib/cwl/stg9c7d362d-006c-4528-a092-4fd6eeb5397c/dqc_reference/reference_markers.fasta -out GCA_903953595.1_freshwater_MAG_---_Loc090519-4m_bin-561_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-30 19:33:07,105] [INFO] Task succeeded: Blastn [2023-06-30 19:33:07,110] [INFO] Selected 35 target genomes. [2023-06-30 19:33:07,110] [INFO] Target genome list was writen to GCA_903953595.1_freshwater_MAG_---_Loc090519-4m_bin-561_genomic.fna/target_genomes.txt [2023-06-30 19:33:07,117] [INFO] Task started: fastANI [2023-06-30 19:33:07,117] [INFO] Running command: fastANI --query /var/lib/cwl/stg31790951-2451-481c-838b-b84a552c2020/GCA_903953595.1_freshwater_MAG_---_Loc090519-4m_bin-561_genomic.fna.gz --refList GCA_903953595.1_freshwater_MAG_---_Loc090519-4m_bin-561_genomic.fna/target_genomes.txt --output GCA_903953595.1_freshwater_MAG_---_Loc090519-4m_bin-561_genomic.fna/fastani_result.tsv --threads 1 [2023-06-30 19:33:39,125] [INFO] Task succeeded: fastANI [2023-06-30 19:33:39,126] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg9c7d362d-006c-4528-a092-4fd6eeb5397c/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2023-06-30 19:33:39,126] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg9c7d362d-006c-4528-a092-4fd6eeb5397c/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2023-06-30 19:33:39,151] [INFO] Found 35 fastANI hits (0 hits with ANI > threshold) [2023-06-30 19:33:39,151] [INFO] The taxonomy check result is classified as 'below_threshold'. [2023-06-30 19:33:39,151] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Motilibacter aurantiacus strain=K478 GCA_011250645.1 2714955 2714955 type True 77.2648 121 655 95 below_threshold Motilibacter deserti strain=E257 GCA_011250635.1 2714956 2714956 type True 77.0836 119 655 95 below_threshold Nocardioides ganghwensis strain=9920 GCA_014779655.1 252230 252230 type True 77.0783 122 655 95 below_threshold Nocardioides gansuensis strain=WSJ-1 GCA_003076135.1 2138300 2138300 type True 77.0222 117 655 95 below_threshold Nocardioides lacusdianchii strain=JXJ CY 38 GCA_020102855.1 2783664 2783664 type True 76.9809 128 655 95 below_threshold Nocardioides okcheonensis strain=MMS20-HV4-12 GCA_020991065.1 2894081 2894081 type True 76.8085 126 655 95 below_threshold Jiangella rhizosphaerae strain=NEAU-YY265 GCA_003579925.1 2293569 2293569 type True 76.7217 110 655 95 below_threshold Streptomyces mashuensis strain=JCM 4059 GCA_014654785.1 33904 33904 type True 76.6871 124 655 95 below_threshold Nocardioides jishulii strain=dk3136 GCA_006007965.1 2575440 2575440 type True 76.6529 110 655 95 below_threshold Nonomuraea lactucae strain=NEAU-YG30 GCA_003313395.1 2249762 2249762 type True 76.6329 135 655 95 below_threshold Nonomuraea muscovyensis strain=DSM 45913 GCA_014207745.1 1124761 1124761 type True 76.6263 142 655 95 below_threshold Streptomyces hiroshimensis strain=JCM 4586 GCA_014650335.1 66424 66424 type True 76.6172 125 655 95 below_threshold Streptomyces mobaraensis strain=DSM 40847 GCA_017916255.1 35621 35621 type True 76.6156 117 655 95 below_threshold Nocardioides furvisabuli strain=JCM 13813 GCA_021083185.1 375542 375542 type True 76.5885 126 655 95 below_threshold Nocardioides seonyuensis strain=MMS17-SY207-3 GCA_004683965.1 2518371 2518371 type True 76.5877 115 655 95 below_threshold Streptomyces mobaraensis strain=DSM 40847 GCA_000342125.1 35621 35621 type True 76.5745 118 655 95 below_threshold Kitasatospora humi strain=RB6PN24 GCA_020907985.1 2893891 2893891 type True 76.5389 122 655 95 below_threshold Jiangella alba strain=DSM 45237 GCA_900106035.1 561176 561176 type True 76.5359 113 655 95 below_threshold Jiangella ureilytica strain=KC603 GCA_004348545.1 2530374 2530374 type True 76.5279 129 655 95 below_threshold Kitasatospora acidiphila strain=MMS16-CNU292 GCA_006636205.1 2567942 2567942 type True 76.5254 109 655 95 below_threshold Jiangella gansuensis strain=DSM 44835 GCA_000515395.1 281473 281473 type True 76.5229 95 655 95 below_threshold Actinomadura viridis strain=DSM 43175 GCA_015751755.1 58110 58110 type True 76.4613 112 655 95 below_threshold Streptomyces cyaneochromogenes strain=MK-45 GCA_003963535.1 2496836 2496836 type True 76.4407 121 655 95 below_threshold Streptomyces nigra strain=452 GCA_003074055.1 1827580 1827580 type True 76.4216 127 655 95 below_threshold Agromyces agglutinans strain=CFH 90414 GCA_009647605.1 2662258 2662258 type True 76.4069 86 655 95 below_threshold Actinomadura bangladeshensis strain=DSM 45347 GCA_004348335.1 453573 453573 type True 76.3959 97 655 95 below_threshold Sphaerisporangium corydalis strain=NEAU-YHS15 GCA_025506355.1 1441875 1441875 type True 76.3819 127 655 95 below_threshold Actinotalea solisilvae strain=KACC 19191 GCA_016464425.1 2072922 2072922 type True 76.3596 118 655 95 below_threshold Cellulomonas persica strain=NBRC 101101 GCA_007989825.1 76861 76861 type True 76.2952 96 655 95 below_threshold Isoptericola jiangsuensis strain=DSM 21863 GCA_002563715.1 548579 548579 type True 76.2011 94 655 95 below_threshold Actinomadura citrea strain=DSM 43461 GCA_013409045.1 46158 46158 type True 76.1225 112 655 95 below_threshold Cellulomonas xylanilytica strain=NBRC 101102 GCA_007989805.1 233583 233583 type True 76.1016 103 655 95 below_threshold Actinomadura mexicana strain=DSM 44485 GCA_900188105.1 134959 134959 type True 76.0774 107 655 95 below_threshold Actinomadura coerulea strain=DSM 43675 GCA_014208105.1 46159 46159 type True 75.9824 120 655 95 below_threshold Actinomadura coerulea strain=JCM 3320 GCA_014648535.1 46159 46159 type True 75.9532 120 655 95 below_threshold -------------------------------------------------------------------------------- [2023-06-30 19:33:39,153] [INFO] DFAST Taxonomy check result was written to GCA_903953595.1_freshwater_MAG_---_Loc090519-4m_bin-561_genomic.fna/tc_result.tsv [2023-06-30 19:33:39,154] [INFO] ===== Taxonomy check completed ===== [2023-06-30 19:33:39,154] [INFO] ===== Start completeness check using CheckM ===== [2023-06-30 19:33:39,154] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg9c7d362d-006c-4528-a092-4fd6eeb5397c/dqc_reference/checkm_data [2023-06-30 19:33:39,157] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2023-06-30 19:33:39,188] [INFO] Task started: CheckM [2023-06-30 19:33:39,188] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCA_903953595.1_freshwater_MAG_---_Loc090519-4m_bin-561_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCA_903953595.1_freshwater_MAG_---_Loc090519-4m_bin-561_genomic.fna/checkm_input GCA_903953595.1_freshwater_MAG_---_Loc090519-4m_bin-561_genomic.fna/checkm_result [2023-06-30 19:34:05,668] [INFO] Task succeeded: CheckM [2023-06-30 19:34:05,670] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 100.00% Contamintation: 2.55% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2023-06-30 19:34:05,693] [INFO] ===== Completeness check finished ===== [2023-06-30 19:34:05,694] [INFO] ===== Start GTDB Search ===== [2023-06-30 19:34:05,694] [INFO] Query marker FASTA already exists. Will reuse it. (GCA_903953595.1_freshwater_MAG_---_Loc090519-4m_bin-561_genomic.fna/markers.fasta) [2023-06-30 19:34:05,695] [INFO] Task started: Blastn [2023-06-30 19:34:05,695] [INFO] Running command: blastn -query GCA_903953595.1_freshwater_MAG_---_Loc090519-4m_bin-561_genomic.fna/markers.fasta -db /var/lib/cwl/stg9c7d362d-006c-4528-a092-4fd6eeb5397c/dqc_reference/reference_markers_gtdb.fasta -out GCA_903953595.1_freshwater_MAG_---_Loc090519-4m_bin-561_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-30 19:34:07,386] [INFO] Task succeeded: Blastn [2023-06-30 19:34:07,391] [INFO] Selected 13 target genomes. [2023-06-30 19:34:07,392] [INFO] Target genome list was writen to GCA_903953595.1_freshwater_MAG_---_Loc090519-4m_bin-561_genomic.fna/target_genomes_gtdb.txt [2023-06-30 19:34:07,416] [INFO] Task started: fastANI [2023-06-30 19:34:07,417] [INFO] Running command: fastANI --query /var/lib/cwl/stg31790951-2451-481c-838b-b84a552c2020/GCA_903953595.1_freshwater_MAG_---_Loc090519-4m_bin-561_genomic.fna.gz --refList GCA_903953595.1_freshwater_MAG_---_Loc090519-4m_bin-561_genomic.fna/target_genomes_gtdb.txt --output GCA_903953595.1_freshwater_MAG_---_Loc090519-4m_bin-561_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2023-06-30 19:34:15,637] [INFO] Task succeeded: fastANI [2023-06-30 19:34:15,652] [INFO] Found 13 fastANI hits (1 hits with ANI > circumscription radius) [2023-06-30 19:34:15,652] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCA_903954205.1 s__Mxb001 sp903954205 98.2605 445 655 d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Nanopelagicales;f__S36-B12;g__Mxb001 95.0 98.46 98.20 0.79 0.70 14 conclusive GCA_016870425.1 s__Mxb001 sp016870425 81.7147 416 655 d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Nanopelagicales;f__S36-B12;g__Mxb001 95.0 N/A N/A N/A N/A 1 - GCA_004379115.1 s__Mxb001 sp004379115 80.02 315 655 d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Nanopelagicales;f__S36-B12;g__Mxb001 95.0 N/A N/A N/A N/A 1 - GCA_016870515.1 s__Mxb001 sp016870515 79.4185 334 655 d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Nanopelagicales;f__S36-B12;g__Mxb001 95.0 N/A N/A N/A N/A 1 - GCA_903848435.1 s__Mxb001 sp903848435 79.2841 236 655 d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Nanopelagicales;f__S36-B12;g__Mxb001 95.0 96.93 95.89 0.80 0.74 3 - GCA_903909555.1 s__Mxb001 sp903909555 79.09 310 655 d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Nanopelagicales;f__S36-B12;g__Mxb001 95.0 99.15 99.05 0.83 0.81 4 - GCA_903934925.1 s__UBA10649 sp903934925 77.9061 148 655 d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Nanopelagicales;f__S36-B12;g__UBA10649 95.0 99.42 99.35 0.85 0.83 4 - GCA_016717115.1 s__UBA10649 sp016717115 77.8632 165 655 d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Nanopelagicales;f__S36-B12;g__UBA10649 95.0 N/A N/A N/A N/A 1 - GCA_017856385.1 s__UBA10649 sp017856385 77.5531 170 655 d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Nanopelagicales;f__S36-B12;g__UBA10649 95.0 N/A N/A N/A N/A 1 - GCF_015390235.1 s__Quadrisphaera sp015390235 76.9013 100 655 d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Actinomycetales;f__Quadrisphaeraceae;g__Quadrisphaera 95.0 N/A N/A N/A N/A 1 - GCF_017916455.1 s__Nocardioides sp017916455 76.738 103 655 d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Propionibacteriales;f__Nocardioidaceae;g__Nocardioides 95.0 98.92 98.92 0.96 0.96 2 - GCF_900659615.1 s__Spirillospora fibrosa 76.0594 116 655 d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Streptosporangiales;f__Streptosporangiaceae;g__Spirillospora 95.0 N/A N/A N/A N/A 1 - GCF_014208105.1 s__Spirillospora coerulea 75.981 120 655 d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Streptosporangiales;f__Streptosporangiaceae;g__Spirillospora 95.0 100.00 100.00 1.00 1.00 2 - -------------------------------------------------------------------------------- [2023-06-30 19:34:15,690] [INFO] GTDB search result was written to GCA_903953595.1_freshwater_MAG_---_Loc090519-4m_bin-561_genomic.fna/result_gtdb.tsv [2023-06-30 19:34:15,691] [INFO] ===== GTDB Search completed ===== [2023-06-30 19:34:15,697] [INFO] DFAST_QC result json was written to GCA_903953595.1_freshwater_MAG_---_Loc090519-4m_bin-561_genomic.fna/dqc_result.json [2023-06-30 19:34:15,698] [INFO] DFAST_QC completed! [2023-06-30 19:34:15,698] [INFO] Total running time: 0h1m19s