Convert nucleotide sequences with IUPAC codes to an regular expression

This online tool generates a regular expression from nucleotide sequences which can include IUPAC codes. This allows to use any string/pattern search program (e.g. the linux commandline tool grep) to extract a given consensus sequence from a large file, for example a FASTA/FASTQ file obtained from a next generation sequencing experiment.

Example Usage

Consensus nucleotide sequence with IUPAC as extracted from the genome browser

GCNATAACTMTGTHC

Regular expression with ambigous IUPAC characters resolved:

GC[ACGT]ATAACT[AC]TGT[ACT]C

Finding the sequencing in a FASTQ file on the commandline:

grep "GC[ACGT]ATAACT[AC]TGT[ACT]C" SAMPLE_1.fastq

Would you like to sharpen your NGS data analysis skills?

Join one of our public workshops!

Receive updates about NGS articles and trainings

Share this article

About us

ecSeq is a bioinformatics solution provider with solid expertise in the analysis of high-throughput sequencing data. We can help you to get the most out of your sequencing experiments by developing data analysis strategies and expert consulting. We organize public workshops and conduct on-site trainings on NGS data analysis.

Last updated on August 07, 2016