[2023-03-16 16:53:43,343] [INFO] DFAST_QC pipeline started.
[2023-03-16 16:53:43,343] [INFO] DFAST_QC version: 0.5.7
[2023-03-16 16:53:43,344] [INFO] DQC Reference Directory: /var/lib/cwl/stg9b465790-0eee-45b5-a9db-e9a93882662d/dqc_reference
[2023-03-16 16:53:44,469] [INFO] ===== Start taxonomy check using ANI =====
[2023-03-16 16:53:44,470] [INFO] Task started: Prodigal
[2023-03-16 16:53:44,470] [INFO] Running command: cat /var/lib/cwl/stg7b67f5e5-0ba6-4d81-a9cb-1f68cbd85d1b/OceanDNA-b27522.fa | prodigal -d OceanDNA-b27522/cds.fna -a OceanDNA-b27522/protein.faa -g 11 -q > /dev/null
[2023-03-16 16:54:02,624] [INFO] Task succeeded: Prodigal
[2023-03-16 16:54:02,624] [INFO] Task started: HMMsearch
[2023-03-16 16:54:02,624] [INFO] Running command: hmmsearch --tblout OceanDNA-b27522/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg9b465790-0eee-45b5-a9db-e9a93882662d/dqc_reference/reference_markers.hmm OceanDNA-b27522/protein.faa > /dev/null
[2023-03-16 16:54:02,832] [INFO] Task succeeded: HMMsearch
[2023-03-16 16:54:02,833] [WARNING] Found 3/6 markers. [/var/lib/cwl/stg7b67f5e5-0ba6-4d81-a9cb-1f68cbd85d1b/OceanDNA-b27522.fa]
[2023-03-16 16:54:02,877] [INFO] Query marker FASTA was written to OceanDNA-b27522/markers.fasta
[2023-03-16 16:54:02,888] [INFO] Task started: Blastn
[2023-03-16 16:54:02,888] [INFO] Running command: blastn -query OceanDNA-b27522/markers.fasta -db /var/lib/cwl/stg9b465790-0eee-45b5-a9db-e9a93882662d/dqc_reference/reference_markers.fasta -out OceanDNA-b27522/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5
[2023-03-16 16:54:03,616] [INFO] Task succeeded: Blastn
[2023-03-16 16:54:03,616] [INFO] Selected 16 target genomes.
[2023-03-16 16:54:03,617] [INFO] Target genome list was writen to OceanDNA-b27522/target_genomes.txt
[2023-03-16 16:54:03,635] [INFO] Task started: fastANI
[2023-03-16 16:54:03,635] [INFO] Running command: fastANI --query /var/lib/cwl/stg7b67f5e5-0ba6-4d81-a9cb-1f68cbd85d1b/OceanDNA-b27522.fa --refList OceanDNA-b27522/target_genomes.txt --output OceanDNA-b27522/fastani_result.tsv --threads 1
[2023-03-16 16:54:22,272] [INFO] Task succeeded: fastANI
[2023-03-16 16:54:22,273] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg9b465790-0eee-45b5-a9db-e9a93882662d/dqc_reference/prokaryote_ANI_species_specific_threshold.txt
[2023-03-16 16:54:22,273] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg9b465790-0eee-45b5-a9db-e9a93882662d/dqc_reference/prokaryote_ANI_species_specific_threshold.txt]
[2023-03-16 16:54:22,282] [INFO] Found 14 fastANI hits (0 hits with ANI > threshold)
[2023-03-16 16:54:22,282] [INFO] The taxonomy check result is classified as 'below_threshold'.
[2023-03-16 16:54:22,282] [INFO] DFAST Taxonomy check final result
--------------------------------------------------------------------------------
organism_name	strain	accession	taxid	species_taxid	relation_to_type	validated	ani	matched_fragments	total_fragments	ani_threshold	status
Zongyanglinia huanghaiensis	strain=CY05	GCA_009753675.1	2682100	2682100	type	True	79.9919	333	687	95	below_threshold
Zongyanglinia marina	strain=DSW4-44	GCA_005771405.1	2578117	2578117	type	True	79.728	346	687	95	below_threshold
Pseudophaeobacter arcticus	strain=DSM 23566	GCA_000473205.1	385492	385492	type	True	77.5698	180	687	95	below_threshold
Tritonibacter multivorans	strain=CECT 7557	GCA_001458415.1	928856	928856	type	True	77.2462	124	687	95	below_threshold
Tritonibacter multivorans	strain=DSM 26470	GCA_900112515.1	928856	928856	type	True	77.2014	126	687	95	below_threshold
Phaeobacter piscinae	strain=P14	GCA_002407245.1	1580596	1580596	type	True	77.1315	159	687	95	below_threshold
Phaeobacter italicus	strain=CECT 7645	GCA_001258055.1	481446	481446	type	True	76.9705	147	687	95	below_threshold
Phaeobacter italicus	strain=DSM 26436	GCA_900113345.1	481446	481446	type	True	76.9529	149	687	95	below_threshold
Phaeobacter gallaeciensis	strain=DSM 26640	GCA_000511385.1	60890	60890	type	True	76.9188	155	687	95	below_threshold
Phaeobacter gallaeciensis	strain=DSM 26640	GCA_000819625.1	60890	60890	type	True	76.8856	155	687	95	below_threshold
Ruegeria marisrubri	strain=ZGT118	GCA_001507595.1	1685379	1685379	type	True	76.6975	138	687	95	below_threshold
Ruegeria meonggei	strain=CECT 8411	GCA_900172215.1	1446476	1446476	type	True	76.6155	117	687	95	below_threshold
Ruegeria haliotis	strain=B1Z28	GCA_013377785.1	2747601	2747601	type	True	76.3711	112	687	95	below_threshold
Salipiger marinus	strain=DSM 26424	GCA_900100085.1	555512	555512	type	True	76.0029	65	687	95	below_threshold
--------------------------------------------------------------------------------
[2023-03-16 16:54:22,283] [INFO] DFAST Taxonomy check result was written to OceanDNA-b27522/tc_result.tsv
[2023-03-16 16:54:22,283] [INFO] ===== Taxonomy check completed =====
[2023-03-16 16:54:22,283] [INFO] ===== Start completeness check using CheckM =====
[2023-03-16 16:54:22,283] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg9b465790-0eee-45b5-a9db-e9a93882662d/dqc_reference/checkm_data
[2023-03-16 16:54:22,284] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM
[2023-03-16 16:54:22,289] [INFO] Task started: CheckM
[2023-03-16 16:54:22,290] [INFO] Running command: checkm taxonomy_wf --tab_table -f OceanDNA-b27522/cc_result.tsv -t 1 life "Prokaryote" OceanDNA-b27522/checkm_input OceanDNA-b27522/checkm_result
[2023-03-16 16:55:04,379] [INFO] Task succeeded: CheckM
[2023-03-16 16:55:04,379] [INFO] Completeness check finished.
--------------------------------------------------------------------------------
Completeness: 55.10%
Contamintation: 9.72%
Strain heterogeneity: 100.00%
--------------------------------------------------------------------------------
[2023-03-16 16:55:04,382] [INFO] ===== Completeness check finished =====
[2023-03-16 16:55:04,382] [INFO] ===== Start GTDB Search =====
[2023-03-16 16:55:04,382] [INFO] Query marker FASTA already exists. Will reuse it. (OceanDNA-b27522/markers.fasta)
[2023-03-16 16:55:04,383] [INFO] Task started: Blastn
[2023-03-16 16:55:04,383] [INFO] Running command: blastn -query OceanDNA-b27522/markers.fasta -db /var/lib/cwl/stg9b465790-0eee-45b5-a9db-e9a93882662d/dqc_reference/reference_markers_gtdb.fasta -out OceanDNA-b27522/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5
[2023-03-16 16:55:05,096] [INFO] Task succeeded: Blastn
[2023-03-16 16:55:05,097] [INFO] Selected 11 target genomes.
[2023-03-16 16:55:05,097] [INFO] Target genome list was writen to OceanDNA-b27522/target_genomes_gtdb.txt
[2023-03-16 16:55:05,103] [INFO] Task started: fastANI
[2023-03-16 16:55:05,104] [INFO] Running command: fastANI --query /var/lib/cwl/stg7b67f5e5-0ba6-4d81-a9cb-1f68cbd85d1b/OceanDNA-b27522.fa --refList OceanDNA-b27522/target_genomes_gtdb.txt --output OceanDNA-b27522/fastani_result_gtdb.tsv --threads 1
[2023-03-16 16:55:13,071] [INFO] Task succeeded: fastANI
[2023-03-16 16:55:13,078] [INFO] Found 11 fastANI hits (0 hits with ANI > circumscription radius)
[2023-03-16 16:55:13,078] [INFO] GTDB search result
--------------------------------------------------------------------------------
accession	gtdb_species	ani	matched_fragments	total_fragments	gtdb_taxonomy	ani_circumscription_radius	mean_intra_species_ani	min_intra_species_ani	mean_intra_species_af	min_intra_species_af	num_clustered_genomes	status
GCF_011813015.1	s__Parasedimentitalea sp011813015	81.2679	371	687	d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Parasedimentitalea	95.0	N/A	N/A	N/A	N/A	1	-
GCF_009753675.1	s__Parasedimentitalea huanghaiensis	79.9919	333	687	d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Parasedimentitalea	95.0	N/A	N/A	N/A	N/A	1	-
GCF_005771405.1	s__Parasedimentitalea marina_A	79.728	346	687	d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Parasedimentitalea	95.0	97.09	97.09	0.92	0.92	2	-
GCA_002162795.1	s__Parasedimentitalea sp002162795	79.3018	347	687	d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Parasedimentitalea	95.0	N/A	N/A	N/A	N/A	1	-
GCA_013214575.1	s__Parasedimentitalea sp013214575	79.2222	327	687	d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Parasedimentitalea	95.0	N/A	N/A	N/A	N/A	1	-
GCF_004006175.1	s__Parasedimentitalea marina	78.9258	315	687	d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Parasedimentitalea	95.0	N/A	N/A	N/A	N/A	1	-
GCA_002401945.1	s__Parasedimentitalea sp002401945	78.7427	324	687	d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Parasedimentitalea	95.0	N/A	N/A	N/A	N/A	1	-
GCF_011806385.1	s__Epibacterium sp011806385	77.0312	129	687	d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Epibacterium	95.0	N/A	N/A	N/A	N/A	1	-
GCF_013031355.1	s__Ruegeria arenilitoris_C	76.881	126	687	d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Ruegeria	95.0	99.51	99.51	0.98	0.98	2	-
GCF_013030985.1	s__Ruegeria arenilitoris_B	76.8336	120	687	d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Ruegeria	95.0	99.15	99.15	0.92	0.92	2	-
GCF_900172215.1	s__Ruegeria meonggei	76.6153	118	687	d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae;g__Ruegeria	95.0	N/A	N/A	N/A	N/A	1	-
--------------------------------------------------------------------------------
[2023-03-16 16:55:13,079] [INFO] GTDB search result was written to OceanDNA-b27522/result_gtdb.tsv
[2023-03-16 16:55:13,079] [INFO] ===== GTDB Search completed =====
[2023-03-16 16:55:13,080] [INFO] DFAST_QC result json was written to OceanDNA-b27522/dqc_result.json
[2023-03-16 16:55:13,080] [INFO] DFAST_QC completed!
[2023-03-16 16:55:13,080] [INFO] Total running time: 0h1m30s
