Convert nucleotide sequences with IUPAC codes to an regular expression

This online tool generates a regular expression from nucleotide sequences which can include IUPAC codes. This allows to use any string/pattern search program (e.g. the linux commandline tool grep) to extract a given consensus sequence from a large file, for example a FASTA/FASTQ file obtained from a next generation sequencing experiment.



Example Usage

Consensus nucleotide sequence with IUPAC as extracted from the genome browser

GCNATAACTMTGTHC

Regular expression with ambigous IUPAC characters resolved:

GC[ACGT]ATAACT[AC]TGT[ACT]C

Finding the sequencing in a FASTQ file on the commandline:

grep "GC[ACGT]ATAACT[AC]TGT[ACT]C" SAMPLE_1.fastq

Receive updates about NGS articles and trainings

Share this article



Last updated on August 07, 2016

ecSeq is a bioinformatics solution provider with solid expertise in the analysis of high-throughput sequencing data. We organize public workshops and conduct on-site trainings on NGS data analysis.

Would you like to receive updates about our NGS trainings and solutions? Then sign-up for our newsletter