[2023-03-14 10:55:58,932] [INFO] DFAST_QC pipeline started.
[2023-03-14 10:55:58,932] [INFO] DFAST_QC version: 0.5.7
[2023-03-14 10:55:58,932] [INFO] DQC Reference Directory: /var/lib/cwl/stg965efef5-2e13-4902-a367-77c53e491995/dqc_reference
[2023-03-14 10:56:00,753] [INFO] ===== Start taxonomy check using ANI =====
[2023-03-14 10:56:00,753] [INFO] Task started: Prodigal
[2023-03-14 10:56:00,754] [INFO] Running command: cat /var/lib/cwl/stg9e69f270-c614-4707-bcfb-73a3ffc6b342/OceanDNA-b30031.fa | prodigal -d OceanDNA-b30031/cds.fna -a OceanDNA-b30031/protein.faa -g 11 -q > /dev/null
[2023-03-14 10:56:24,621] [INFO] Task succeeded: Prodigal
[2023-03-14 10:56:24,621] [INFO] Task started: HMMsearch
[2023-03-14 10:56:24,621] [INFO] Running command: hmmsearch --tblout OceanDNA-b30031/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg965efef5-2e13-4902-a367-77c53e491995/dqc_reference/reference_markers.hmm OceanDNA-b30031/protein.faa > /dev/null
[2023-03-14 10:56:24,929] [INFO] Task succeeded: HMMsearch
[2023-03-14 10:56:24,929] [INFO] Found 6/6 markers.
[2023-03-14 10:56:24,952] [INFO] Query marker FASTA was written to OceanDNA-b30031/markers.fasta
[2023-03-14 10:56:24,952] [INFO] Task started: Blastn
[2023-03-14 10:56:24,953] [INFO] Running command: blastn -query OceanDNA-b30031/markers.fasta -db /var/lib/cwl/stg965efef5-2e13-4902-a367-77c53e491995/dqc_reference/reference_markers.fasta -out OceanDNA-b30031/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5
[2023-03-14 10:56:25,693] [INFO] Task succeeded: Blastn
[2023-03-14 10:56:25,694] [INFO] Selected 18 target genomes.
[2023-03-14 10:56:25,694] [INFO] Target genome list was writen to OceanDNA-b30031/target_genomes.txt
[2023-03-14 10:56:25,707] [INFO] Task started: fastANI
[2023-03-14 10:56:25,707] [INFO] Running command: fastANI --query /var/lib/cwl/stg9e69f270-c614-4707-bcfb-73a3ffc6b342/OceanDNA-b30031.fa --refList OceanDNA-b30031/target_genomes.txt --output OceanDNA-b30031/fastani_result.tsv --threads 1
[2023-03-14 10:56:43,646] [INFO] Task succeeded: fastANI
[2023-03-14 10:56:43,646] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg965efef5-2e13-4902-a367-77c53e491995/dqc_reference/prokaryote_ANI_species_specific_threshold.txt
[2023-03-14 10:56:43,647] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg965efef5-2e13-4902-a367-77c53e491995/dqc_reference/prokaryote_ANI_species_specific_threshold.txt]
[2023-03-14 10:56:43,657] [INFO] Found 18 fastANI hits (0 hits with ANI > threshold)
[2023-03-14 10:56:43,658] [INFO] The taxonomy check result is classified as 'below_threshold'.
[2023-03-14 10:56:43,658] [INFO] DFAST Taxonomy check final result
--------------------------------------------------------------------------------
organism_name	strain	accession	taxid	species_taxid	relation_to_type	validated	ani	matched_fragments	total_fragments	ani_threshold	status
Pseudophaeobacter arcticus	strain=DSM 23566	GCA_000473205.1	385492	385492	type	True	83.8687	972	1232	95	below_threshold
Pseudophaeobacter flagellatus	strain=MA21411-1	GCA_021228235.1	2899119	2899119	type	True	81.5671	840	1232	95	below_threshold
Pseudophaeobacter leonis	strain=306	GCA_002087335.1	1144477	1144477	type	True	81.4262	844	1232	95	below_threshold
Leisingera aquaemixtae	strain=CECT 8399	GCA_001458395.1	1396826	1396826	type	True	78.7797	596	1232	95	below_threshold
Leisingera daeponensis	strain=DSM 23529	GCA_000473145.1	405746	405746	type	True	78.7566	595	1232	95	below_threshold
Leisingera aquimarina	strain=DSM 24565	GCA_000473165.1	476529	476529	type	True	78.7088	572	1232	95	below_threshold
Phaeobacter inhibens	strain=DSM 16374	GCA_000473105.1	221822	221822	type	True	78.6869	494	1232	95	below_threshold
Leisingera caerulea	strain=DSM 24564	GCA_000473325.1	506591	506591	type	True	78.6795	590	1232	95	below_threshold
Phaeobacter gallaeciensis	strain=DSM 26640	GCA_000819625.1	60890	60890	type	True	78.5219	509	1232	95	below_threshold
Phaeobacter gallaeciensis	strain=DSM 26640	GCA_000511385.1	60890	60890	type	True	78.5207	503	1232	95	below_threshold
Leisingera methylohalidivorans	strain=DSM 14336; MB2	GCA_000511355.1	133924	133924	type	True	78.5142	593	1232	95	below_threshold
Phaeobacter piscinae	strain=P14	GCA_002407245.1	1580596	1580596	type	True	78.4699	531	1232	95	below_threshold
Phaeobacter porticola	strain=P97	GCA_001888185.1	1844006	1844006	type	True	78.3116	469	1232	95	below_threshold
Zongyanglinia marina	strain=DSW4-44	GCA_005771405.1	2578117	2578117	type	True	77.945	341	1232	95	below_threshold
Zongyanglinia huanghaiensis	strain=CY05	GCA_009753675.1	2682100	2682100	type	True	77.8574	359	1232	95	below_threshold
Ruegeria marisrubri	strain=ZGT118	GCA_001507595.1	1685379	1685379	type	True	77.7145	400	1232	95	below_threshold
Salipiger marinus	strain=DSM 26424	GCA_900100085.1	555512	555512	type	True	77.0127	273	1232	95	below_threshold
Salipiger pallidus	strain=CGMCC 1.15762	GCA_014643635.1	1775170	1775170	type	True	76.8555	178	1232	95	below_threshold
--------------------------------------------------------------------------------
[2023-03-14 10:56:43,658] [INFO] DFAST Taxonomy check result was written to OceanDNA-b30031/tc_result.tsv
[2023-03-14 10:56:43,658] [INFO] ===== Taxonomy check completed =====
[2023-03-14 10:56:43,658] [INFO] ===== Start completeness check using CheckM =====
[2023-03-14 10:56:43,658] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg965efef5-2e13-4902-a367-77c53e491995/dqc_reference/checkm_data
[2023-03-14 10:56:43,659] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM
[2023-03-14 10:56:43,664] [INFO] Task started: CheckM
[2023-03-14 10:56:43,664] [INFO] Running command: checkm taxonomy_wf --tab_table -f OceanDNA-b30031/cc_result.tsv -t 1 life "Prokaryote" OceanDNA-b30031/checkm_input OceanDNA-b30031/checkm_result
[2023-03-14 10:57:42,105] [INFO] Task succeeded: CheckM
[2023-03-14 10:57:42,106] [INFO] Completeness check finished.
--------------------------------------------------------------------------------
Completeness: 86.36%
Contamintation: 0.00%
Strain heterogeneity: 0.00%
--------------------------------------------------------------------------------
[2023-03-14 10:57:42,109] [INFO] ===== Completeness check finished =====
[2023-03-14 10:57:42,109] [INFO] ===== Start GTDB Search =====
[2023-03-14 10:57:42,109] [INFO] Query marker FASTA already exists. Will reuse it. (OceanDNA-b30031/markers.fasta)
[2023-03-14 10:57:42,109] [INFO] Task started: Blastn
[2023-03-14 10:57:42,109] [INFO] Running command: blastn -query OceanDNA-b30031/markers.fasta -db /var/lib/cwl/stg965efef5-2e13-4902-a367-77c53e491995/dqc_reference/reference_markers_gtdb.fasta -out OceanDNA-b30031/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5
[2023-03-14 10:57:43,146] [INFO] Task succeeded: Blastn
[2023-03-14 10:57:43,147] [INFO] Selected 6 target genomes.
[2023-03-14 10:57:43,147] [INFO] Target genome list was writen to OceanDNA-b30031/target_genomes_gtdb.txt
[2023-03-14 10:57:43,153] [INFO] Task started: fastANI
[2023-03-14 10:57:43,153] [INFO] Running command: fastANI --query /var/lib/cwl/stg9e69f270-c614-4707-bcfb-73a3ffc6b342/OceanDNA-b30031.fa --refList OceanDNA-b30031/target_genomes_gtdb.txt --output OceanDNA-b30031/fastani_result_gtdb.tsv --threads 1
[2023-03-14 10:57:49,537] [INFO] Task succeeded: fastANI
[2023-03-14 10:57:49,542] [INFO] Found 6 fastANI hits (1 hits with ANI > circumscription radius)
[2023-03-14 10:57:49,542] [INFO] GTDB search result
--------------------------------------------------------------------------------
accession	gtdb_species	ani	matched_fragments	total_fragments	gtdb_taxonomy	ani_circumscription_radius	mean_intra_species_ani	min_intra_species_ani	mean_intra_species_af	min_intra_species_af	num_clustered_genomes	status
GCF_001294455.1	s__Pseudophaeobacter sp001294455	98.1438	1176	1232	d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Pseudophaeobacter	95.0	98.13	98.13	0.97	0.97	2	conclusive
GCF_900313025.1	s__Pseudophaeobacter sp900313025	86.1144	1034	1232	d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Pseudophaeobacter	95.0	N/A	N/A	N/A	N/A	1	-
GCF_000152965.1	s__Pseudophaeobacter sp000152965	84.9185	953	1232	d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Pseudophaeobacter	95.0	98.26	98.26	0.93	0.93	2	-
GCF_000473205.1	s__Pseudophaeobacter arcticus	83.8687	972	1232	d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Pseudophaeobacter	95.0	N/A	N/A	N/A	N/A	1	-
GCA_905479575.1	s__Pseudophaeobacter sp905479575	83.6542	896	1232	d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Pseudophaeobacter	95.0	N/A	N/A	N/A	N/A	1	-
GCA_015665415.1	s__Pseudophaeobacter sp015665415	82.0165	863	1232	d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Pseudophaeobacter	95.0	N/A	N/A	N/A	N/A	1	-
--------------------------------------------------------------------------------
[2023-03-14 10:57:49,543] [INFO] GTDB search result was written to OceanDNA-b30031/result_gtdb.tsv
[2023-03-14 10:57:49,543] [INFO] ===== GTDB Search completed =====
[2023-03-14 10:57:49,544] [INFO] DFAST_QC result json was written to OceanDNA-b30031/dqc_result.json
[2023-03-14 10:57:49,544] [INFO] DFAST_QC completed!
[2023-03-14 10:57:49,544] [INFO] Total running time: 0h1m51s
