年始以降ずっと放置していたゲノムアセンブリを再開しました。今回のソフトはSOAPdenovo2 です。
参考:De novoアセンブリ関連のツールまとめ(主にショートリード) - macでインフォマティクス (hatenablog.com)
#install
conda create -n soapdenovo -y
conda (source) activate soapdenovo
conda install -c bioconda -y soapdenovo2
このソフトウェアは事前にconfig_file を作成しなくてはいけません。
#config_file recipe: SOAPdenovo2/example.config at master · aquaskyline/SOAPdenovo2 · GitHub
#config_file example
#maximal read length
max_rd_len=150
[LIB]
#average insert size
avg_ins=25
#if sequence needs to be reversed. The parameter "reverse_seq" should be set to indicate this: 0, forward-reverse; 1, forward-forward.
reverse_seq=0
#in which part(s) the reads are used. It takes value 1(only contig assembly), 2 (only scaffold assembly), 3(both contig and scaffold assembly), or 4 (only gap closure).
asm_flags=3
#use only first 100 bps of each read
rd_len_cutoff
#in which order the reads are used while scaffolding
rank=2
# cutoff of pair number for a reliable connection (at least 3 for short insert size)
pair_num_cutoff=3
#minimum aligned length to contigs for a reliable read location (at least 32 for short insert size)
map_len=32
#a pair of fastq file, read 1 file should always be followed by read 2 file
q1=/path/**LIBNAMEA**/fastq1_read_1.fq
q2=/path/**LIBNAMEA**/fastq1_read_2.fq
#another pair of fastq file, read 1 file should always be followed by read 2 file
q1=/path/**LIBNAMEA**/fastq2_read_1.fq
q2=/path/**LIBNAMEA**/fastq2_read_2.fq
#a pair of fasta file, read 1 file should always be followed by read 2 file
f1=/path/**LIBNAMEA**/fasta1_read_1.fa
f2=/path/**LIBNAMEA**/fasta1_read_2.fa
#another pair of fasta file, read 1 file should always be followed by read 2 file
f1=/path/**LIBNAMEA**/fasta2_read_1.fa
f2=/path/**LIBNAMEA**/fasta2_read_2.fa
#fastq file for single reads
q=/path/**LIBNAMEA**/fastq1_read_single.fq
#another fastq file for single reads
q=/path/**LIBNAMEA**/fastq2_read_single.fq
#fasta file for single reads
f=/path/**LIBNAMEA**/fasta1_read_single.fa
#another fasta file for single reads
f=/path/**LIBNAMEA**/fasta2_read_single.fa
#a single fasta file for paired reads
p=/path/**LIBNAMEA**/pairs1_in_one_file.fa
#another single fasta file for paired reads
p=/path/**LIBNAMEA**/pairs2_in_one_file.fa
#bam file for single or paired reads, reads 1 in paired reads file should always be followed by reads 2
#NOTE: If a read in bam file fails platform/vendor quality checks(the flag field 0x0200 is set), itself and it's paired read would be ignored.
b=/path/**LIBNAMEA**/reads1_in_file.bam
#another bam file for single or paired reads
b=/path/**LIBNAMEA**/reads2_in_file.bam
#実行
>>SOAPdenovo-63mer
Version 2.04: released on July 13th, 2012
Compile Feb 14 2018 20:38:54
Usage: SOAPdenovo <command> [option]
pregraph construct kmer-graph
sparse_pregraph construct sparse kmer-graph
contig eliminate errors and output contigs
map map reads to contigs
scaff construct scaffolds
all do pregraph-contig-map-scaff in turn
わたしはグラフもコンティグの処理もスキャホールドも必要なのでoption はall で実行しました。
SOAPdenovo-63mer all -s config_file -K 14 -R -o output
結果は後日追記します。