研究メモブログの目次

【de novo ゲノムアセンブリ】SOAPdenovoを使ってみた

年始以降ずっと放置していたゲノムアセンブリを再開しました。今回のソフトはSOAPdenovo2 です。

参考:De novoアセンブリ関連のツールまとめ(主にショートリード) - macでインフォマティクス (hatenablog.com)

 

#install

conda create -n soapdenovo -y
conda (source) activate soapdenovo
conda install -c bioconda -y soapdenovo2

 

このソフトウェアは事前にconfig_file を作成しなくてはいけません。

#config_file recipe: SOAPdenovo2/example.config at master · aquaskyline/SOAPdenovo2 · GitHub

 

#config_file example

#maximal read length

max_rd_len=150

[LIB]

#average insert size

avg_ins=25

#if sequence needs to be reversed. The parameter "reverse_seq" should be set to indicate this: 0, forward-reverse; 1, forward-forward.

reverse_seq=0

#in which part(s) the reads are used. It takes value 1(only contig assembly), 2 (only scaffold assembly), 3(both contig and scaffold assembly), or 4 (only gap closure).

asm_flags=3

#use only first 100 bps of each read

rd_len_cutoff

#in which order the reads are used while scaffolding

rank=2

# cutoff of pair number for a reliable connection (at least 3 for short insert size)

pair_num_cutoff=3

#minimum aligned length to contigs for a reliable read location (at least 32 for short insert size)

map_len=32

#a pair of fastq file, read 1 file should always be followed by read 2 file

q1=/path/**LIBNAMEA**/fastq1_read_1.fq

q2=/path/**LIBNAMEA**/fastq1_read_2.fq

#another pair of fastq file, read 1 file should always be followed by read 2 file

q1=/path/**LIBNAMEA**/fastq2_read_1.fq

q2=/path/**LIBNAMEA**/fastq2_read_2.fq

#a pair of fasta file, read 1 file should always be followed by read 2 file

f1=/path/**LIBNAMEA**/fasta1_read_1.fa

f2=/path/**LIBNAMEA**/fasta1_read_2.fa

#another pair of fasta file, read 1 file should always be followed by read 2 file

f1=/path/**LIBNAMEA**/fasta2_read_1.fa

f2=/path/**LIBNAMEA**/fasta2_read_2.fa

#fastq file for single reads

q=/path/**LIBNAMEA**/fastq1_read_single.fq

#another fastq file for single reads

q=/path/**LIBNAMEA**/fastq2_read_single.fq

#fasta file for single reads

f=/path/**LIBNAMEA**/fasta1_read_single.fa

#another fasta file for single reads

f=/path/**LIBNAMEA**/fasta2_read_single.fa

#a single fasta file for paired reads

p=/path/**LIBNAMEA**/pairs1_in_one_file.fa

#another single fasta file for paired reads

p=/path/**LIBNAMEA**/pairs2_in_one_file.fa

#bam file for single or paired reads, reads 1 in paired reads file should always be followed by reads 2

#NOTE: If a read in bam file fails platform/vendor quality checks(the flag field 0x0200 is set), itself and it's paired read would be ignored.

b=/path/**LIBNAMEA**/reads1_in_file.bam

#another bam file for single or paired reads

b=/path/**LIBNAMEA**/reads2_in_file.bam 

 

#実行

>>SOAPdenovo-63mer

Version 2.04: released on July 13th, 2012
Compile Feb 14 2018 20:38:54

Usage: SOAPdenovo <command> [option]
pregraph    construct kmer-graph
sparse_pregraph   construct sparse kmer-graph
contig      eliminate errors and output contigs
map        map reads to contigs
scaff        construct scaffolds
all         do pregraph-contig-map-scaff in turn

わたしはグラフもコンティグの処理もスキャホールドも必要なのでoption はall で実行しました。

SOAPdenovo-63mer all -s config_file -K 14 -R -o output

結果は後日追記します。