[2024-01-24 11:58:57,335] [INFO] DFAST_QC pipeline started.
[2024-01-24 11:58:57,346] [INFO] DFAST_QC version: 0.5.7
[2024-01-24 11:58:57,346] [INFO] DQC Reference Directory: /var/lib/cwl/stg1a22baae-5e7e-4160-a6da-9734e96dce9d/dqc_reference
[2024-01-24 11:58:58,794] [INFO] ===== Start taxonomy check using ANI =====
[2024-01-24 11:58:58,795] [INFO] Task started: Prodigal
[2024-01-24 11:58:58,795] [INFO] Running command: gunzip -c /var/lib/cwl/stgc69bf95c-7a9d-4344-8ade-a72547b2820b/GCF_900184255.1_PRJEB20938_genomic.fna.gz | prodigal -d GCF_900184255.1_PRJEB20938_genomic.fna/cds.fna -a GCF_900184255.1_PRJEB20938_genomic.fna/protein.faa -g 11 -q > /dev/null
[2024-01-24 11:59:05,914] [INFO] Task succeeded: Prodigal
[2024-01-24 11:59:05,915] [INFO] Task started: HMMsearch
[2024-01-24 11:59:05,915] [INFO] Running command: hmmsearch --tblout GCF_900184255.1_PRJEB20938_genomic.fna/hmmer_result.tsv -E 1E-50 /var/lib/cwl/stg1a22baae-5e7e-4160-a6da-9734e96dce9d/dqc_reference/reference_markers.hmm GCF_900184255.1_PRJEB20938_genomic.fna/protein.faa > /dev/null
[2024-01-24 11:59:06,264] [INFO] Task succeeded: HMMsearch
[2024-01-24 11:59:06,266] [INFO] Found 6/6 markers.
[2024-01-24 11:59:06,311] [INFO] Query marker FASTA was written to GCF_900184255.1_PRJEB20938_genomic.fna/markers.fasta
[2024-01-24 11:59:06,311] [INFO] Task started: Blastn
[2024-01-24 11:59:06,311] [INFO] Running command: blastn -query GCF_900184255.1_PRJEB20938_genomic.fna/markers.fasta -db /var/lib/cwl/stg1a22baae-5e7e-4160-a6da-9734e96dce9d/dqc_reference/reference_markers.fasta -out GCF_900184255.1_PRJEB20938_genomic.fna/blast.markers.tsv -outfmt 6 -max_hsps 1 -num_alignments 5
[2024-01-24 11:59:06,925] [INFO] Task succeeded: Blastn
[2024-01-24 11:59:06,929] [INFO] Selected 28 target genomes.
[2024-01-24 11:59:06,929] [INFO] Target genome list was writen to GCF_900184255.1_PRJEB20938_genomic.fna/target_genomes.txt
[2024-01-24 11:59:06,957] [INFO] Task started: fastANI
[2024-01-24 11:59:06,958] [INFO] Running command: fastANI --query /var/lib/cwl/stgc69bf95c-7a9d-4344-8ade-a72547b2820b/GCF_900184255.1_PRJEB20938_genomic.fna.gz --refList GCF_900184255.1_PRJEB20938_genomic.fna/target_genomes.txt --output GCF_900184255.1_PRJEB20938_genomic.fna/fastani_result.tsv --threads 1
[2024-01-24 11:59:34,007] [INFO] Task succeeded: fastANI
[2024-01-24 11:59:34,008] [INFO] Loading species specific ANI threshold from /var/lib/cwl/stg1a22baae-5e7e-4160-a6da-9734e96dce9d/dqc_reference/prokaryote_ANI_species_specific_threshold.txt
[2024-01-24 11:59:34,008] [WARNING] Species-specific ANI threshold file not found. Will use the default threshold for all species. [/var/lib/cwl/stg1a22baae-5e7e-4160-a6da-9734e96dce9d/dqc_reference/prokaryote_ANI_species_specific_threshold.txt]
[2024-01-24 11:59:34,029] [INFO] Found 27 fastANI hits (1 hits with ANI > threshold)
[2024-01-24 11:59:34,029] [INFO] The taxonomy check result is classified as 'conclusive'.
[2024-01-24 11:59:34,030] [INFO] DFAST Taxonomy check final result
--------------------------------------------------------------------------------
organism_name	strain	accession	taxid	species_taxid	relation_to_type	validated	ani	matched_fragments	total_fragments	ani_threshold	status
Haloimpatiens massiliensis	strain=Mt13	GCA_900184255.1	1658110	1658110	type	True	100.0	1393	1394	95	conclusive
Hathewaya massiliensis	strain=Marseille-P3545	GCA_902143515.1	1964382	1964382	type	True	78.6782	191	1394	95	below_threshold
Clostridium felsineum	strain=DSM 793	GCA_002006235.2	36839	36839	type	True	77.6957	158	1394	95	below_threshold
Clostridium putrefaciens	strain=NCTC9836	GCA_900461105.1	99675	99675	type	True	77.5183	158	1394	95	below_threshold
Clostridium swellfunianum	strain=CICC 10730	GCA_023656515.1	1367462	1367462	type	True	77.5157	110	1394	95	below_threshold
Clostridium felsineum	strain=DSM 7320	GCA_002006215.2	36839	36839	type	True	77.5063	146	1394	95	below_threshold
Clostridium carboxidivorans	strain=P7	GCA_001038625.1	217159	217159	type	True	77.4617	260	1394	95	below_threshold
Clostridium botulinum	strain=ATCC 25763	GCA_001276985.1	1491	1491	type	True	77.3932	233	1394	95	below_threshold
Clostridium botulinum	strain=ATCC 25763	GCA_011017965.1	1491	1491	type	True	77.3563	238	1394	95	below_threshold
Clostridium muellerianum	strain=P21	GCA_012926525.1	2716538	2716538	type	True	77.3149	238	1394	95	below_threshold
Clostridium drakei	strain=SL1	GCA_003096175.1	332101	332101	type	True	77.2849	241	1394	95	below_threshold
Clostridium chauvoei	strain=DSM 7528	GCA_002327185.1	46867	46867	type	True	77.2067	148	1394	95	below_threshold
Clostridium isatidis	strain=DSM 15098	GCA_002285495.1	182773	182773	type	True	77.2006	114	1394	95	below_threshold
Clostridium autoethanogenum	strain=DSM 10061	GCA_000484505.2	84023	84023	suspected-type	True	77.1067	150	1394	95	below_threshold
Clostridium faecium	strain=N37	GCA_014836835.1	2762223	2762223	type	True	77.0598	192	1394	95	below_threshold
Clostridium ljungdahlii	strain=DSM 13528	GCA_000143685.1	1538	1538	suspected-type	True	76.9768	154	1394	95	below_threshold
Clostridium polyendosporum	strain=JCM 30710	GCA_018332455.1	69208	69208	type	True	76.8052	115	1394	95	below_threshold
Clostridium drakei	strain=SL1	GCA_000633595.2	332101	332101	type	True	76.7627	223	1394	95	below_threshold
Clostridium felsineum	strain=DSM 794	GCA_002006355.2	36839	36839	type	True	76.7256	147	1394	95	below_threshold
Clostridium autoethanogenum	strain=JA1-1	GCA_002189005.1	84023	84023	suspected-type	True	76.6177	140	1394	95	below_threshold
Clostridium ihumii	strain=AP5	GCA_000612845.1	1470356	1470356	type	True	76.552	226	1394	95	below_threshold
Clostridium senegalense	strain=type strain: JC122	GCA_000285575.1	1465809	1465809	type	True	76.5099	228	1394	95	below_threshold
Clostridium prolinivorans	strain=PYR-10	GCA_004011155.1	2769420	2769420	type	True	76.5052	172	1394	95	below_threshold
Clostridium autoethanogenum	strain=DSM 10061	GCA_000427255.1	84023	84023	suspected-type	True	76.3509	136	1394	95	below_threshold
Clostridium hydrogenum	strain=CUEA01	GCA_021432385.1	2855764	2855764	type	True	76.264	171	1394	95	below_threshold
Clostridium algoriphilum	strain=DSM 16153	GCA_020443705.1	198347	198347	type	True	76.1805	127	1394	95	below_threshold
Clostridium tarantellae	strain=DSM 3997	GCA_009295725.1	39493	39493	type	True	75.8317	162	1394	95	below_threshold
--------------------------------------------------------------------------------
[2024-01-24 11:59:34,031] [INFO] DFAST Taxonomy check result was written to GCF_900184255.1_PRJEB20938_genomic.fna/tc_result.tsv
[2024-01-24 11:59:34,032] [INFO] ===== Taxonomy check completed =====
[2024-01-24 11:59:34,032] [INFO] ===== Start completeness check using CheckM =====
[2024-01-24 11:59:34,033] [INFO] Setting CHECKM_DATA_PATH to /var/lib/cwl/stg1a22baae-5e7e-4160-a6da-9734e96dce9d/dqc_reference/checkm_data
[2024-01-24 11:59:34,034] [INFO] Selected 'Prokaryote' markers (life, taxid=0) for CheckM
[2024-01-24 11:59:34,078] [INFO] Task started: CheckM
[2024-01-24 11:59:34,078] [INFO] Running command: checkm taxonomy_wf --tab_table -f GCF_900184255.1_PRJEB20938_genomic.fna/cc_result.tsv -t 1 life "Prokaryote" GCF_900184255.1_PRJEB20938_genomic.fna/checkm_input GCF_900184255.1_PRJEB20938_genomic.fna/checkm_result
[2024-01-24 12:00:00,065] [INFO] Task succeeded: CheckM
[2024-01-24 12:00:00,067] [INFO] Completeness check finished.
--------------------------------------------------------------------------------
Completeness: 100.00%
Contamintation: 0.00%
Strain heterogeneity: 0.00%
--------------------------------------------------------------------------------
[2024-01-24 12:00:00,090] [INFO] ===== Completeness check finished =====
[2024-01-24 12:00:00,090] [INFO] ===== Start GTDB Search =====
[2024-01-24 12:00:00,091] [INFO] Query marker FASTA already exists. Will reuse it. (GCF_900184255.1_PRJEB20938_genomic.fna/markers.fasta)
[2024-01-24 12:00:00,091] [INFO] Task started: Blastn
[2024-01-24 12:00:00,091] [INFO] Running command: blastn -query GCF_900184255.1_PRJEB20938_genomic.fna/markers.fasta -db /var/lib/cwl/stg1a22baae-5e7e-4160-a6da-9734e96dce9d/dqc_reference/reference_markers_gtdb.fasta -out GCF_900184255.1_PRJEB20938_genomic.fna/blast.markers.gtdb.tsv -outfmt 6 -max_hsps 1 -num_alignments 5
[2024-01-24 12:00:00,952] [INFO] Task succeeded: Blastn
[2024-01-24 12:00:00,956] [INFO] Selected 16 target genomes.
[2024-01-24 12:00:00,956] [INFO] Target genome list was writen to GCF_900184255.1_PRJEB20938_genomic.fna/target_genomes_gtdb.txt
[2024-01-24 12:00:00,969] [INFO] Task started: fastANI
[2024-01-24 12:00:00,969] [INFO] Running command: fastANI --query /var/lib/cwl/stgc69bf95c-7a9d-4344-8ade-a72547b2820b/GCF_900184255.1_PRJEB20938_genomic.fna.gz --refList GCF_900184255.1_PRJEB20938_genomic.fna/target_genomes_gtdb.txt --output GCF_900184255.1_PRJEB20938_genomic.fna/fastani_result_gtdb.tsv --threads 1
[2024-01-24 12:00:18,481] [INFO] Task succeeded: fastANI
[2024-01-24 12:00:18,494] [INFO] Found 15 fastANI hits (1 hits with ANI > circumscription radius)
[2024-01-24 12:00:18,495] [INFO] GTDB search result
--------------------------------------------------------------------------------
accession	gtdb_species	ani	matched_fragments	total_fragments	gtdb_taxonomy	ani_circumscription_radius	mean_intra_species_ani	min_intra_species_ani	mean_intra_species_af	min_intra_species_af	num_clustered_genomes	status
GCF_900184255.1	s__Haloimpatiens massiliensis	100.0	1392	1394	d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Haloimpatiens	95.0	N/A	N/A	N/A	N/A	1	conclusive
GCF_000744935.1	s__Haloimpatiens sp000744935	92.1083	1109	1394	d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Haloimpatiens	95.0	N/A	N/A	N/A	N/A	1	-
GCF_901447495.1	s__Haloimpatiens lingqiaonensis	87.0664	791	1394	d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Haloimpatiens	95.0	N/A	N/A	N/A	N/A	1	-
GCF_902143515.1	s__Hathewaya massiliensis	78.639	191	1394	d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Hathewaya	95.0	N/A	N/A	N/A	N/A	1	-
GCF_900461105.1	s__Clostridium_L putrefaciens	77.5727	158	1394	d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium_L	95.0	N/A	N/A	N/A	N/A	1	-
GCF_002285495.1	s__Clostridium isatidis	77.284	116	1394	d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium	95.0	99.19	99.19	0.81	0.81	2	-
GCF_003614235.1	s__Clostridium_H haemolyticum	77.2603	182	1394	d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium_H	95.0	98.44	97.38	0.90	0.84	13	-
GCF_001276215.1	s__Clostridium_F sp001276215	77.218	245	1394	d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium_F	95.0	97.32	95.44	0.88	0.82	10	-
GCF_001758365.1	s__Clostridium_C acetireducens	76.9483	185	1394	d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium_C	95.0	N/A	N/A	N/A	N/A	1	-
GCF_004006395.2	s__Clostridium_B sp004006395	76.8684	164	1394	d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium_B	95.0	N/A	N/A	N/A	N/A	1	-
GCF_018332455.1	s__Clostridium_AR polyendosporum	76.7537	117	1394	d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium_AR	95.0	N/A	N/A	N/A	N/A	1	-
GCA_002341865.1	s__Clostridium_J sp002341865	76.7102	186	1394	d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium_J	95.0	N/A	N/A	N/A	N/A	1	-
GCF_001636845.1	s__Clostridium_B ljungdahlii_A	76.5696	158	1394	d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium_B	95.0	N/A	N/A	N/A	N/A	1	-
GCF_018918055.1	s__Clostridium sp018918055	76.4414	96	1394	d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium	95.0	N/A	N/A	N/A	N/A	1	-
GCF_002006355.1	s__Clostridium_S felsineum	76.1061	132	1394	d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium_S	95.0	98.29	98.24	0.88	0.87	4	-
--------------------------------------------------------------------------------
[2024-01-24 12:00:18,497] [INFO] GTDB search result was written to GCF_900184255.1_PRJEB20938_genomic.fna/result_gtdb.tsv
[2024-01-24 12:00:18,499] [INFO] ===== GTDB Search completed =====
[2024-01-24 12:00:18,504] [INFO] DFAST_QC result json was written to GCF_900184255.1_PRJEB20938_genomic.fna/dqc_result.json
[2024-01-24 12:00:18,504] [INFO] DFAST_QC completed!
[2024-01-24 12:00:18,504] [INFO] Total running time: 0h1m21s
