Run time-consuming processes in parallel on Unix systems

Using NGS tools like samtools mpileup for a whole genome can take forever. So, handling each chromosome separately and parallely running them on several cores will speed up your pipeline. Using xargs you can easily realize it.

Example usage of xargs (-P is the number of parallel processes started - don't use more than the number of cores you have available):

samtools view -H yourFile.bam | grep "\@SQ" | sed 's/^.*SN://g' | cut -f 1 | xargs -I {} -n 1 -P 24 sh -c "samtools mpileup -BQ0 -d 100000 -uf yourGenome.fa -r {} yourFile.bam | bcftools view -vcg - > tmp.{}.vcf"

To merge the results afterwards, you might want to do something like this:

samtools view -H yourFile.bam | grep "\@SQ" | sed 's/^.*SN://g' | cut -f 1 | perl -ane 'system("cat tmp.$F[0].bcf >> yourFile.vcf");'


Last updated on May 06, 2013

ecSeq is a bioinformatics solution provider with solid expertise in the analysis of high-throughput sequencing data. We organize public workshops and conduct on-site trainings on NGS data analysis.

Would you like to receive updates about our NGS trainings and solutions? Then sign-up for our newsletter