[2023-06-27 05:00:35,900] [INFO] DFAST_QC pipeline started. [2023-06-27 05:00:35,903] [INFO] DFAST_QC version: 0.5.7 [2023-06-27 05:00:35,903] [INFO] DQC Reference Directory: /var/lib/cwl/stgd6a8dd4e-a1af-4d7f-9bee-17e41980832b/dqc_reference [2023-06-27 05:00:37,166] [INFO] ===== Start taxonomy check using ANI ===== [2023-06-27 05:00:37,167] [INFO] Task started: Prodigal [2023-06-27 05:00:37,167] [INFO] Running command: gunzip -c /var/lib/cwl/stg4e503540-9574-4841-8e2f-b3936c6ffda3/GCA_002428565.1_ASM242856v1_genomic.fna.gz | prodigal -d GCA_002428565.1_ASM242856v1_genomic.fna/cds.fna -a GCA_002428565.1_ASM242856v1_genomic.fna/protein.faa -g 11 -q > /dev/null [2023-06-27 05:00:49,934] [INFO] Task succeeded: Prodigal [2023-06-27 05:00:49,934] [INFO] Task started: HMMsearch [2023-06-27 05:00:49,934] [INFO] Running command: hmmsearch --tblout GCA_002428565.1_ASM242856v1_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stgd6a8dd4e-a1af-4d7f-9bee-17e41980832b/dqc_reference/reference_markers.hmm GCA_002428565.1_ASM242856v1_genomic.fna/protein.faa > /dev/null [2023-06-27 05:00:50,233] [INFO] Task succeeded: HMMsearch [2023-06-27 05:00:50,234] [INFO] Found 6/6 markers. [2023-06-27 05:00:50,281] [INFO] Query marker FASTA was written to GCA_002428565.1_ASM242856v1_genomic.fna/markers.fasta [2023-06-27 05:00:50,282] [INFO] Task started: Blastn [2023-06-27 05:00:50,282] [INFO] Running command: blastn -query GCA_002428565.1_ASM242856v1_genomic.fna/markers.fasta -db /var/lib/cwl/stgd6a8dd4e-a1af-4d7f-9bee-17e41980832b/dqc_reference/reference_markers.fasta -out GCA_002428565.1_ASM242856v1_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-27 05:00:50,891] [INFO] Task succeeded: Blastn [2023-06-27 05:00:50,894] [INFO] Selected 23 target genomes. [2023-06-27 05:00:50,895] [INFO] Target genome list was writen to GCA_002428565.1_ASM242856v1_genomic.fna/target_genomes.txt [2023-06-27 05:00:50,903] [INFO] Task started: fastANI [2023-06-27 05:00:50,903] [INFO] Running command: fastANI --query /var/lib/cwl/stg4e503540-9574-4841-8e2f-b3936c6ffda3/GCA_002428565.1_ASM242856v1_genomic.fna.gz --refList GCA_002428565.1_ASM242856v1_genomic.fna/target_genomes.txt --output GCA_002428565.1_ASM242856v1_genomic.fna/fastani_result.tsv --threads 1 [2023-06-27 05:01:05,843] [INFO] Task succeeded: fastANI [2023-06-27 05:01:05,844] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stgd6a8dd4e-a1af-4d7f-9bee-17e41980832b/dqc_reference/prokaryote_ANI_species_specific_threshold.txt [2023-06-27 05:01:05,844] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stgd6a8dd4e-a1af-4d7f-9bee-17e41980832b/dqc_reference/prokaryote_ANI_species_specific_threshold.txt] [2023-06-27 05:01:05,857] [INFO] Found 6 fastANI hits (0 hits with ANI > threshold) [2023-06-27 05:01:05,857] [INFO] The taxonomy check result is classified as 'below_threshold'. [2023-06-27 05:01:05,857] [INFO] DFAST Taxonomy check final result -------------------------------------------------------------------------------- organism_name strain accession taxid species_taxid relation_to_type validated ani matched_fragments total_fragments ani_threshold status Levilinea saccharolytica strain=KIBI-1 GCA_001306035.1 229921 229921 type True 76.3596 129 1343 95 below_threshold Levilinea saccharolytica strain=KIBI-1 GCA_001050255.2 229921 229921 type True 76.3183 127 1343 95 below_threshold Longilinea arvoryzae strain=KOME-1 GCA_001050235.2 360412 360412 type True 76.0676 101 1343 95 below_threshold Ornatilinea apprima strain=P3M-1 GCA_001306115.1 1134406 1134406 type True 76.0229 65 1343 95 below_threshold Anaerolinea thermophila strain=UNI-1 GCA_000199675.1 167964 167964 type True 75.9977 60 1343 95 below_threshold Aggregatilinea lenta strain=MO-CFX2 GCA_003569045.1 913108 913108 type True 75.3314 61 1343 95 below_threshold -------------------------------------------------------------------------------- [2023-06-27 05:01:05,859] [INFO] DFAST Taxonomy check result was written to GCA_002428565.1_ASM242856v1_genomic.fna/tc_result.tsv [2023-06-27 05:01:05,860] [INFO] ===== Taxonomy check completed ===== [2023-06-27 05:01:05,860] [INFO] ===== Start completeness check using CheckM ===== [2023-06-27 05:01:05,860] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stgd6a8dd4e-a1af-4d7f-9bee-17e41980832b/dqc_reference/checkm_data [2023-06-27 05:01:05,862] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM [2023-06-27 05:01:05,907] [INFO] Task started: CheckM [2023-06-27 05:01:05,907] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCA_002428565.1_ASM242856v1_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCA_002428565.1_ASM242856v1_genomic.fna/checkm_input GCA_002428565.1_ASM242856v1_genomic.fna/checkm_result [2023-06-27 05:01:45,583] [INFO] Task succeeded: CheckM [2023-06-27 05:01:45,585] [INFO] Completeness check finished. -------------------------------------------------------------------------------- Completeness: 95.83% Contamintation: 0.00% Strain heterogeneity: 0.00% -------------------------------------------------------------------------------- [2023-06-27 05:01:45,610] [INFO] ===== Completeness check finished ===== [2023-06-27 05:01:45,611] [INFO] ===== Start GTDB Search ===== [2023-06-27 05:01:45,611] [INFO] Query marker FASTA already exists. Will reuse it. (GCA_002428565.1_ASM242856v1_genomic.fna/markers.fasta) [2023-06-27 05:01:45,611] [INFO] Task started: Blastn [2023-06-27 05:01:45,612] [INFO] Running command: blastn -query GCA_002428565.1_ASM242856v1_genomic.fna/markers.fasta -db /var/lib/cwl/stgd6a8dd4e-a1af-4d7f-9bee-17e41980832b/dqc_reference/reference_markers_gtdb.fasta -out GCA_002428565.1_ASM242856v1_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5 [2023-06-27 05:01:46,480] [INFO] Task succeeded: Blastn [2023-06-27 05:01:46,486] [INFO] Selected 29 target genomes. [2023-06-27 05:01:46,486] [INFO] Target genome list was writen to GCA_002428565.1_ASM242856v1_genomic.fna/target_genomes_gtdb.txt [2023-06-27 05:01:46,513] [INFO] Task started: fastANI [2023-06-27 05:01:46,513] [INFO] Running command: fastANI --query /var/lib/cwl/stg4e503540-9574-4841-8e2f-b3936c6ffda3/GCA_002428565.1_ASM242856v1_genomic.fna.gz --refList GCA_002428565.1_ASM242856v1_genomic.fna/target_genomes_gtdb.txt --output GCA_002428565.1_ASM242856v1_genomic.fna/fastani_result_gtdb.tsv --threads 1 [2023-06-27 05:02:02,586] [INFO] Task succeeded: fastANI [2023-06-27 05:02:02,608] [INFO] Found 22 fastANI hits (1 hits with ANI > circumscription radius) [2023-06-27 05:02:02,608] [INFO] GTDB search result -------------------------------------------------------------------------------- accession gtdb_species ani matched_fragments total_fragments gtdb_taxonomy ani_circumscription_radius mean_intra_species_ani min_intra_species_ani mean_intra_species_af min_intra_species_af num_clustered_genomes status GCA_002428565.1 s__UBA6092 sp002428565 100.0 1337 1343 d__Bacteria;p__Chloroflexota;c__Anaerolineae;o__Anaerolineales;f__UBA4823;g__UBA6092 95.0 N/A N/A N/A N/A 1 conclusive GCA_016932955.1 s__UBA6092 sp016932955 77.5329 282 1343 d__Bacteria;p__Chloroflexota;c__Anaerolineae;o__Anaerolineales;f__UBA4823;g__UBA6092 95.0 N/A N/A N/A N/A 1 - GCA_009772985.1 s__UBA877 sp009772985 76.4645 142 1343 d__Bacteria;p__Chloroflexota;c__Anaerolineae;o__Anaerolineales;f__EnvOPS12;g__UBA877 95.0 N/A N/A N/A N/A 1 - GCA_012516075.1 s__JAAYXE01 sp012516075 76.4559 107 1343 d__Bacteria;p__Chloroflexota;c__Anaerolineae;o__Anaerolineales;f__UBA4823;g__JAAYXE01 95.0 N/A N/A N/A N/A 1 - GCA_016935755.1 s__JAFGQI01 sp016935755 76.3984 120 1343 d__Bacteria;p__Chloroflexota;c__Anaerolineae;o__Anaerolineales;f__UBA4823;g__JAFGQI01 95.0 N/A N/A N/A N/A 1 - GCF_001306035.1 s__Levilinea saccharolytica 76.3329 131 1343 d__Bacteria;p__Chloroflexota;c__Anaerolineae;o__Anaerolineales;f__Anaerolineaceae;g__Levilinea 95.0 99.90 99.90 0.95 0.95 2 - GCA_011329365.1 s__DSTM01 sp011329365 76.3057 111 1343 d__Bacteria;p__Chloroflexota;c__Anaerolineae;o__Anaerolineales;f__UBA4823;g__DSTM01 95.0 N/A N/A N/A N/A 1 - GCA_903872065.1 s__CAIQJJ01 sp903872065 76.2304 122 1343 d__Bacteria;p__Chloroflexota;c__Anaerolineae;o__Anaerolineales;f__UBA4823;g__CAIQJJ01 95.0 99.39 99.39 0.86 0.86 2 - GCA_016934715.1 s__JAFGSI01 sp016934715 76.2281 115 1343 d__Bacteria;p__Chloroflexota;c__Anaerolineae;o__Anaerolineales;f__EnvOPS12;g__JAFGSI01 95.0 N/A N/A N/A N/A 1 - GCA_002385985.1 s__UBA3924 sp002385985 76.2035 114 1343 d__Bacteria;p__Chloroflexota;c__Anaerolineae;o__Anaerolineales;f__Anaerolineaceae;g__UBA3924 95.0 99.86 99.86 0.92 0.92 2 - GCA_016191875.1 s__SCUK01 sp016191875 76.1788 123 1343 d__Bacteria;p__Chloroflexota;c__Anaerolineae;o__Anaerolineales;f__UBA11579;g__SCUK01 95.0 N/A N/A N/A N/A 1 - GCA_903829085.1 s__UBA877 sp903829085 76.1544 60 1343 d__Bacteria;p__Chloroflexota;c__Anaerolineae;o__Anaerolineales;f__EnvOPS12;g__UBA877 95.0 N/A N/A N/A N/A 1 - GCA_011055765.1 s__DSOS01 sp011055765 76.1092 106 1343 d__Bacteria;p__Chloroflexota;c__Anaerolineae;o__Anaerolineales;f__UBA4823;g__DSOS01 95.0 N/A N/A N/A N/A 1 - GCA_001795165.1 s__RBG-16-57-11 sp001795165 76.0351 106 1343 d__Bacteria;p__Chloroflexota;c__Anaerolineae;o__Anaerolineales;f__UBA4823;g__RBG-16-57-11 95.0 N/A N/A N/A N/A 1 - GCA_011051635.1 s__DRKV01 sp011051635 76.0235 84 1343 d__Bacteria;p__Chloroflexota;c__Anaerolineae;o__Anaerolineales;f__DRKV01;g__DRKV01 95.0 N/A N/A N/A N/A 1 - GCF_000199675.1 s__Anaerolinea thermophila 75.9977 60 1343 d__Bacteria;p__Chloroflexota;c__Anaerolineae;o__Anaerolineales;f__Anaerolineaceae;g__Anaerolinea 95.0 N/A N/A N/A N/A 1 - GCA_011055775.1 s__DSPF01 sp011055775 75.9754 62 1343 d__Bacteria;p__Chloroflexota;c__Anaerolineae;o__Anaerolineales;f__UBA4823;g__DSPF01 95.0 N/A N/A N/A N/A 1 - GCA_016789165.1 s__SpSt-583 sp016789165 75.949 102 1343 d__Bacteria;p__Chloroflexota;c__Anaerolineae;o__Anaerolineales;f__UBA6663;g__SpSt-583 95.0 N/A N/A N/A N/A 1 - GCA_011176535.1 s__DUEP01 sp011176535 75.831 61 1343 d__Bacteria;p__Chloroflexota;c__Anaerolineae;o__Anaerolineales;f__DRMV01;g__DUEP01 95.0 99.79 99.71 0.93 0.89 3 - GCA_016935415.1 s__JAFGRA01 sp016935415 75.7563 81 1343 d__Bacteria;p__Chloroflexota;c__Anaerolineae;o__Anaerolineales;f__UBA11579;g__JAFGRA01 95.0 N/A N/A N/A N/A 1 - GCA_003250455.1 s__UBA11579 sp003250455 75.7176 65 1343 d__Bacteria;p__Chloroflexota;c__Anaerolineae;o__Anaerolineales;f__UBA11579;g__UBA11579 95.0 N/A N/A N/A N/A 1 - GCA_017860725.1 s__CTSoil-045 sp017860725 75.6693 54 1343 d__Bacteria;p__Chloroflexota;c__Anaerolineae;o__Anaerolineales;f__UBA4823;g__CTSoil-045 95.0 N/A N/A N/A N/A 1 - -------------------------------------------------------------------------------- [2023-06-27 05:02:02,611] [INFO] GTDB search result was written to GCA_002428565.1_ASM242856v1_genomic.fna/result_gtdb.tsv [2023-06-27 05:02:02,611] [INFO] ===== GTDB Search completed ===== [2023-06-27 05:02:02,622] [INFO] DFAST_QC result json was written to GCA_002428565.1_ASM242856v1_genomic.fna/dqc_result.json [2023-06-27 05:02:02,622] [INFO] DFAST_QC completed! [2023-06-27 05:02:02,622] [INFO] Total running time: 0h1m27s