Most databases provide biological sequences in the multiline fasta format. On the commandline however it is a lot easier to work with a fasta file where the sequence spans just a single line.
The following command snippet allows you to convert you the interleaved format to the single line format using awk, which is installed on most Linux systems by default.
awk '{if(NR==1) {print $0} else {if($0 ~ /^>/) {print "\n"$0} else {printf $0}}}' interleaved.fasta > singleline.fasta
ecSeq is a bioinformatics solution provider with solid expertise in the analysis of high-throughput sequencing data. We can help you to get the most out of your sequencing experiments by developing data analysis strategies and expert consulting. We organize public workshops and conduct on-site trainings on NGS data analysis.
Last updated on April 26, 2018