A Beginner's Guide to NGS Data Analysis
Quality Control, Read Mapping, Visualization and Downstream Analyses

9 - 13 March 2015

iad Pc-Pool, Rosa-Luxemburg-Straße 23, Leipzig, Germany

Scope and Topics

The purpose of this workshop is to get a deeper understanding in Next-Generation Sequencing (NGS) with a special focus on bioinformatics issues. Additionally, all workshop participants should be enabled to perform important tasks of NGS data analysis tasks themselves.

The first workshop module is an introduction to data analysis using Linux, assuring that all participants are able to follow the practical parts. The second module dicusses advantages and disadvantages of current sequencing technologies and their implications on data analysis. The most important NGS file formats (fastq, sam/bam, bigWig, etc.) are introduced and one proceeds with first hands-on analyses (QC, mapping, visualization). You will learn how to read and interprete QC plots, clip adapter sequences and/or trim bad quality read ends, get bioinformatics backgrounds about the read mapping and understand its problems (dynamic programming, alignment visualization, NGS mapping heuristics, etc.), perform your own mapping statistics and visualize your data in different ways (IGV, UCSC, etc.). The last two modules adress two specific applications of NGS: RNA-seq of model organisms and RNA-seq of non-model organisms.

Workshop Structure

The 2015 workshop has been redesigned and adapted to the needs of beginners in the field of NGS bioinformatics. The workshops comprises four course modules which can be combined.

  1. Linux for Bioinformatics:
    This course module is optional. It will introduce the essential tools and file formats required for NGS data analysis. It helps to overcome the first hurdles when entering this (for NGS analyses) unavoidable operating system. Every participant who has no background in Linux usage should attend this course!
    (The linux calls and commandline pipes teached here are the basis for all other courses and can not be covered again!)
  2. Introduction to NGS data analysis:
    This module is mandatory. Different methods of NGS will be explained, the most important notations be given and first analyses be performed. This course covers essential knowledge for analysing data of many different NGS applications. It also assures that all participants will be on the same level of knowledge for the downstream courses.
  3. RNA-seq Data Analyses:
    Particpants can choose up to one of the following options:
    1. RNA-Seq for model-organisms
    2. RNA-Seq for non-model organisms
    Depending on the organism you are working with, our trainers will show you what's possible with your data and how you could/should interprete the output data.

Course Prices and Program

Linux for bioinformatics

200 EUR
(industry: 300 EUR)
  • Introduction to the command line and important commands
  • Cobining commands by piping and redirection
  • Introduction to bioinformatics file formats (e.g. FASTA, BED, VCF, WIG) and databases (e.g. UCSC, ENSEMBL)
  • Usage of important bioinformatics toolkits (BEDtools, UCSCtools)
  • Introduction to R

Tuesday and Wednesday
Introduction to NGS data analysis

700 EUR
(industry: 900 EUR)
  • Introduction to sequencing technologies from a data analysts view
  • Raw sequence files (FASTQ format)
  • Preprocessing of raw reads: quality control (FastQC), adapter clipping, quality trimming
  • Introduction to read mapping (Alignment methods, Mapping heuristics)
  • Read mapping (BWA, Bowtie2, STAR, segemehl)
  • Mapping output (SAM/BAM format)
  • Usage of important NGS toolkits (samtools, BEDtools)
  • Mapping statistics
  • Visualization of mapped reads (IGV, UCSC)

Thursday and Friday
RNA-seq Data Analyses

1. RNA-Seq of model organisms
550 EUR*
(industry: 750 EUR)
  • Understand split-read mapping
  • Run different split-read mappers (tophat, segemehl, STAR)
  • Understand the Tuxedo Suite (cufflinks, cuffcompare, cuffmerge, cuffdiff, etc.)
  • Predict new transcripts/isoforms using cufflinks/cuffmerge
  • Quantify exons/genes/transcripts
  • Predict
    • Differential exon usage using DEXseq
    • Differential gene expression using DEseq
    • Differential isoform expression using cuffdiff
  • Predict non-standard transcripts (circularized RNAs and/or fusion transcripts)

2. RNA-Seq of non-model organisms
550 EUR*
(industry: 750 EUR)
  • Understand different methods for genome assembly (overlap-consensus-layout and de-bruijn graphs)
  • De-novo assembly of a non-model organism's transcriptome using different tools (trinity, velvet, etc.)
  • Quantification of the de-novo assembled transcripts
  • Predict differentially expressed transcripts using edgeR
  • Find out what genes the predicted transcripts belong to
  • Understand why different assemblers result in different downstream results

This module is not available anymore.

*To attend this course, the "NGS Introduction" course has to be taken!


  • basic understanding of molecular biology (DNA, RNA, gene expression, PCR, ...)
  • For the Introduction to NGS Data Analysis and downstream courses: basic linux & bioinformatics knowledge (shell usage, common commands and tools). You should be familiar with the commands covered in the Learning the Shell Tutorial

Target Audience

  • biologists or data analysts with no or little experience in analyzing HTS data

Included in the Course

  • Course materials
  • Catering
  • Conference Dinner


Gero Doose (University of Leipzig) found and published several circularized RNAs in various RNA-Seq experiments. He specialized on split-read analysis some years ago and has a strong expertise in downstream analyses.

Christian Otto (University of Leipzig) is one of the developers of the split-read mapping tool segemehl and is an expert on implementing efficient algorithms for HTS data analyses.

David Langenberger (ecSeq Bioinformatics) started working with small non-coding RNAs in 2006. Since 2009 he uses HTS technolgies to investigate these short regulatory RNAs as well as other targets. He has been part of several large HTS projects, for example the International Cancer Genome Consortium (ICGC).

Mario Fasold (ecSeq Bioinformatics) works in the analysis of microarray data since 2007 and developed several bioinformatics tools such as the Bioconductor package AffyRNADegradation and the Larpack program package. Since 2011 he specialized in the field of HTS data analysis and helped analysing sequecing data of several large consortium projects.

Key Dates

Opening Date of Registration: 10 November 2014
Closing Date of Early Registration: 15 January 2015
Closing Date of Registration: 1 March 2015
Workshop: 9 - 13 March 2015 (8:00 - 17:00)


Location: iad Pc-Pool, Rosa-Luxemburg-Straße 23, Leipzig, Germany
Language: English
Available seats: 24 (first-come, first-served)

Travel expenses and accommodation are not covered by the registration fee.


Download flyer


ecSeq Bioinformatics
04275 Leipzig
Email: events@ecSeq.com

>Registration Closed<