With the advent of efficient strategies for experimental RNA structure validation, especially combination of chemical probing with next-generation sequencing technologies, came demand to couple experimental data, e.g. SHAPE reactivities, with in silico RNA structure prediction tools. In this line, the computational structure prediction is guided by in vitro or even in vivo probing data.

We have recently implemented three previously published methods for incorporation of SHAPE probing data into the Vienna RNA Package and benchmarked prediction results with a set of RNAs with known reference structures.

Don’t miss the Supplementary Data since it contains extensive coverage of the applied benchmark strategies and lots of background information.

SHAPE directed RNA folding
Ronny Lorenz, Dominik Luntzer, Ivo L. Hofacker, Peter F. Stadler, Michael T. Wolfinger
Bioinformatics 2015 btv523
DOI: 10.1093/bioinformatics/btv523

Abstract

Summary: Chemical mapping experiments allow for nucleotide resolution assessment of RNA structure. We demonstrate that different strategies of integrating probing data with thermodynamics-based RNA secondary structure prediction algorithms can be implemented by means of soft constraints. This amounts to incorporating suitable pseudo-energies into the standard energy model for RNA secondary structures. As a showcase application for this new feature of the ViennaRNA Package we compare three distinct, previously published strategies to utilize SHAPE reactivities for structure prediction. The new tool is benchmarked on a set of RNAs with known reference structure.

Availability and implementation: The capability for SHAPE directed RNA folding is part of the upcoming release of the ViennaRNA Package 2.2, for which a preliminary release is already freely available at http://www.tbi.univie.ac.at/RNA.

mRNA degradation and translation are two processes crucial for posttranscriptional gene regulation. While the strong interconnection of mRNA degradation and translation has been noted for many years, it has led to the hypothesis that mRNAs could be degraded on the ribosome.

We have recently published a study that strongly supports the hypothesis that the ribosome is a very general site not only for general 5' to 3' mRNA degradation in Drosophila but also for the miRNA-mediated mRNA degradation pathway:

General and miRNA-mediated mRNA degradation occurs on ribosome complexes in Drosophila cells
Sanja Antic, Michael T. Wolfinger, Anna Skucha, Stefanie Hosiner and Silke Dorner
Mol. Cell. Biol. 2015 vol. 35 no. 13 2309-2320
DOI: 10.1128/MCB.01346-14

Abstract

The translation and degradation of mRNAs are two key steps in gene expression that are highly regulated and targeted by many factors, including microRNAs (miRNAs). While it is well established that translation and mRNA degradation are tightly coupled, it is still not entirely clear where in the cell mRNA degradation takes place. In this study, we investigated the possibility of mRNA degradation on the ribosome in Drosophila cells. Using polysome profiles and ribosome affinity purification, we could demonstrate the copurification of various deadenylation and decapping factors with ribosome complexes. Also, AGO1 and GW182, two key factors in the miRNA-mediated mRNA degradation pathway, were associated with ribosome complexes. Their copurification was dependent on intact mRNAs, suggesting the association of these factors with the mRNA rather than the ribosome itself. Furthermore, we isolated decapped mRNA degradation intermediates from ribosome complexes and performed high-throughput sequencing analysis. Interestingly, 93% of the decapped mRNA fragments (approximately 12,000) could be detected at the same relative abundance on ribosome complexes and in cell lysates. In summary, our findings strongly indicate the association of the majority of bulk mRNAs as well as mRNAs targeted by miRNAs with the ribosome during their degradation.

ViennaNGS is a Perl distribution for rapid development of next-generation sequencing analysis pipelines. It comes with a set of modules and Moose-based classes for accomplishing standard and non-standard tasks in NGS processing. ViennaNGS' feature-richness, however, comes at some cost: dependencies on third party Perl modules and external tools and libraries, which can be tedious to install. I have therefore compiled this little ViennaNGS Installation HOWTO to help prospective users getting the software up and running quickly on their systems.

Let’s get started. The installation process provided here has been prepared for Linux systems, however the steps should be reproducible on MacOS X given that you have the common GNU tools and Perl 5.10.0 or higher available, e.g. via MacPorts.

First things first: third party dependencies

ViennaNGS depends on set of third party bioinf tools and libraries, which are required either for specific filtering and file format conversion tasks or for building internally used Perl modules. Before continuing with the ViennaNGS installation, you will need to download and install the following software and ensure that all executables are accessible to the Perl interpreter:

The last dependency is in fact a bit peculiar. Bio::DB::Sam up to version 1.41, which we use for all kind of direct BAM manipulation within ViennaNGS, for some reason will not compile with recent, HTSlib-based versions of samtools. samtools 0.1.19 seems to be the last version that is compatible with current versions of Bio::DB::Sam. So let’s download samtools 0.1.19 to a local folder, say /scratch/software, uncompress the tarball and compile the samtools library:

1
2
3
4
5
6
mkdir -p /scratch/software
cd /scratch/software
wget https://github.com/samtools/samtools/archive/0.1.19.tar.gz
tar xvf 0.1.19.tar.gz
cd samtools-0.1.19
make CFLAGS="-g -O2 -Wall -Wno-unused -Wno-unused-result -fPIC"

The extra -Wno* CFLAGS options prevent some unused variable warnings with recent versions of gcc, whereas the -fPIC CFLAGS option prohibits a relocation error on x64_64 architectures (see also the Bio::DB::Sam README). If everything works out fine, samtools is now compiled, and a static library file libbam.a will be created in the current working directory. Remember the absolute path to libbabm.a (in our example this is /scratch/software/samtools-0.1.19), since we will need this later for compiling Bio::DB::Sam.

Setting up a local Perl module directory with cpanminus

Once all thrid-party dependencies are installed, we can now focus on the Perl dependencies of ViennaNGS, e.g. the BioPerl suite, the Moose object framework and many more. Fortunately, installation of custom Perl modules (i.e. those that are not considered Perl core modules or shipped with a Linux distribution) is a fairly trivial task these days, thanks to a tool called cpanminus. cpanminus renders installation of a custom Perl module as easy as typing cpanm and the module’s name on the command line. Moreover, it provides automatic dependency resolution, which comes in handy when you’re installing Perl modules or distributions that depend on other modules, which again depend on a bunch of modules etc. Forget about the CPAN shell or cpanplus, use cpanminus.

Before starting with cpanminus we’ll need to set a few environment variables that tell cpanm and the Perl interpreter where to keep and look for locally installed Perl modules. We’ll keep up with the cpanminus default location, ${HOME}/perl5 and add the following lines to the end of .bashrc:

1
2
3
export PERL_MM_OPT="INSTALL_BASE=${HOME}/perl5"
export PERL_MB_OPT="--install_base ${HOME}/perl5"
export PERL5LIB=${HOME}/perl5/lib/perl5:${PERL5LIB}

We will also modify our PATH variable to include ${HOME}/perl5/bin, where executable scripts that are shipped with custom Perl modules will be installed:

1
export PATH=${HOME}/perl5/bin:${PATH}

We can now install cpanminus, either the version shipped with our Linux distribution (example given for Debian/Ubuntu must be executed with root privileges)

1
apt-get install cpanminus

or directly from https://cpanmin.us

1
curl -L https://cpanmin.us | perl - App::cpanminus

Installing the ViennaNGS Perl dependencies

After all we are now ready to install the ViennaNGS Perl dependencies. Let’s start with Bio::DB::Sam

1
cpanm Bio::DB::Sam

Depending on your system setup, cpanm will now start to download and install all dependencies of Bio::DB::Sam and will then ask for the location of the static libbam.a library:

1
Configuring Bio-SamTools-1.41 ... Please enter the location of the bam.h and compiled libbam.a files:

Our samtools folder is /scratch/software/samtools-0.1.19 (see above), so let’s enter this here:

1
/scratch/software/samtools-0.1.19

This was basically the only module that we had to install manually due to the samtools location constraint. Actually, we could have installed Bio::ViennaNGS directly and cpanm would have asked for the samtools location, but I thought this was a good chance to demonstrate cpanm usage for didactic reasons.

Installing Bio::ViennaNGS

Here comes the easiest part. Since we’ve got cpanm up and running on our system, all we need to type is

1
cpanm Bio::ViennaNGS

and wait for cpanm to install all dependencies and finish. Beware that ViennaNGS depends on Statistics::R, which requires a running installation of the R Statistics Package on your system. If for any reason cpanm fails (e.g. R is not available), it will stop and report an error. You can then install the missing component and call the above command again. Moreover, whenever there is a new ViennaNGS release available on CPAN, just type cpanm Bio::ViennaNGS and you’ll get the latest version in just a couple of seconds.

Congratulations! You now have a running installation of Bio::ViennaNGS. If you sticked with the paths in our example, the modules/classes are located in ${HOME}/perl5/lib/perl5/Bio/ViennaNGS/lib and the ViennaNGS utilities can be found in ${HOME}/perl5/bin.

The ViennaNGS authors are looking forward to getting your feedback. Please use the ViennaNGS GitHub Issue Tracker for reporting bugs.

ViennaNGS is a Perl distribution for building efficient NGS data and analysis pipelines, integrating high-level routines and wrapper functions for common NGS processing tasks. While ViennaNGS is not an established pipeline per se, it provides tools and functionality for the development of custom NGS pipelines in Perl. ViennaNGS comes with a set of utility scripts that serve as reference implementation for most library functions and can readily be applied for specific tasks or integrated as-is into custom pipelines.

ViennaNGS covers a broad range of NGS data processing tasks, including functionality for extracting and converting features from common NGS file formats, computation and evaluation of read mapping statistics, quantification and normalization of read count data, identification and characterization of splice junctions from RNA-seq data, parsing and condensing sequence motif data, automated construction of Assembly and Track Hubs for the UCSC genome browser and wrapper routines for a set of commonly used NGS command line tools.

We have recently published the ViennaNGS paper at F1000Research:

ViennaNGS: A toolbox for building efficient next-generation sequencing analysis pipelines
Michael T. Wolfinger, Jörg Fallmann, Florian Eggenhofer, Fabian Amman
F1000Research 2015,4:50
DOI: 10.12688/f1000research.6157.1

The ViennaNGS suite is available through Github (https://github.com/mtw/Bio-ViennaNGS) and CPAN (http://search.cpan.org/dist/Bio-ViennaNGS).

Discrete energy landscapes provide a valuable means for analyzing non-equilibrium properties of biopolymers. RNA folding dynamics, for example, can be described by a continuous-time Markov process at the level of local minima, their corresponding basins of attraction and saddle points connecting them.

A connected set of structures, often denoted state space is required for energy landscape construction. While complete suboptimal folding of RNA is practically impossible for chain lengths above 100nt, alternative strategies to enumerate the lower part of the energy landscape emerged over the last years.

We have recently extended previous work on global flooding by a local flooding approach that minimizes memory consumption and published the method in Bioinformatics.

Memory-efficient RNA energy landscape exploration
Martin Mann, Marcel Kucharík, Christoph Flamm, Michael T. Wolfinger
Bioinformatics 2014 30(18):2584-2591
DOI: 10.1093/bioinformatics/btu337

Abstract

Motivation: Energy landscapes provide a valuable means for studying the folding dynamics of short RNA molecules in detail by modeling all possible structures and their transitions. Higher abstraction levels based on a macro-state decomposition of the landscape enable the study of larger systems; however, they are still restricted by huge memory requirements of exact approaches.

Results: We present a highly parallelizable local enumeration scheme that enables the computation of exact macro-state transition models with highly reduced memory requirements. The approach is evaluated on RNA secondary structure landscapes using a gradient basin definition for macro-states. Furthermore, we demonstrate the need for exact transition models by comparing two barrier-based approaches, and perform a detailed investigation of gradient basins in RNA energy landscapes.

Availability and implementation: Source code is part of the C++ Energy Landscape Library available at http://www.bioinf.uni-freiburg.de/Software/.

Whenever it comes to analyzing RNA-seq experiments, there is a need for comparing expression data at a quantitative level. Consider a scenario where samples were taken from different conditions and subjected to Illumina sequencing. Whether those samples were multiplexed or sequenced on a single lane each, one generally gets a different number of raw reads from each sample, refelcting experimental and technical biases inherent in the RNA-seq protocols. Various measures for normalization of RNA-seq samples have been proposed, the most widely used being RPKM (reads per kilobase per million). While RPKM tries to account for different sequencing depth by normalizing by the number of reads sequenced in a specific sample, divided by 106, this very step causes a systematic bias, as has been shown recently (see Wagner et al.,Theory Biosci (2012);131(4):281-5. and Li et al.,Bioinformatics (2010);26(4):493-500).

The central point of these papers is to work out an alternative measure for RNA-seq expression abundance that resembles as closely as possible the relative molar concentraction (rmc) of each RNA species present in a sample. It is easy to see that the average rmc across genes has to be a constant that only depends on the number of genes mapped in an RNA-seq experiment.

One example of measures that fulfills the invariant average criterion is Transcript per million (TPM), being defined as where t_g is a proxy for the number of transcripts that can be explained by a certain number of mapped reads and T is the sum of all t_g over all genes. If one is interested in mRNA abundance, the average TPM - and thus the average rmc is inversely proportional to the number of features present in a reference annotation.

Practically, TPM values for individual genes can be computed from read count tables, ie. tables that give the number of reads overlapping a specific gene. Typical programs for obtaining read count tables are htseq-count or multiBamCov (bedtools multicov).

I have recently implemented normalize_multicov.pl, a tool for computing normalized RNA-seq expression in terms of TPM from multicov files. It is part of the ViennaNGS Perl Modules for NGS analysis and very easy to use: Just provide it the output of a bedtols multicov run on your data as well as the read length used for sequencing your samples and get back a normalized multicov file of your samples in terms of TPM. That’s all …

In silico identification of bacterial transcription start sites (TSS) has been a major challenge for the last years. To address this issue, we have developped TSSAR, a statistical method for analyzing dRNA-seq data, together with colleagues from the Bioinformatics department at the University of Leipzig.

The TSSAR method paper is now out:

TSSAR: TSS annotation regime for dRNA-seq data
Fabian Amman, Michael T Wolfinger, Ronny Lorenz, Ivo L Hofacker, Peter F Stadler, Sven Findeiß
BMC Bioinformatics 2014, 15:89
DOI: 10.1186/1471-2105-15-89

Abstract

Background

Differential RNA sequencing dRNA-seq is a high-throughput screening technique designed to examine the architecture of bacterial operons in general and the precise position of transcription start sites (TSS) in particular. Hitherto, dRNA-seq data were analyzed by visualizing the sequencing reads mapped to the reference genome and manually annotating reliable positions. This is very labor intensive and, due to the subjectivity, biased.

Results

Here, we present TSSAR, a tool for automated de-novo TSS annotation from dRNA-seq data that respects the statistics of dRNA-seq libraries. TSSAR uses the premise that the number of sequencing reads starting at a certain genomic position within a transcriptional active region follows a Poisson distribution with a parameter that depends on the local strength of expression. The differences of two dRNA-seq library counts thus follow a Skellam distribution. This provides a statistical basis to identify significantly enriched primary transcripts.

We assessed the performance by analyzing a publicly available dRNA-seq data set using TSSAR and two simple approaches that utilize user-defined score cutoffs. We evaluated the power of reproducing the manual TSS annotation. Furthermore, the same data set was used to reproduce 74 experimentally validated TSS in H. pylori from reliable techniques such as RACE or primer extension. Both analyses showed that TSSAR outperforms the static cutoff-dependent approaches.

Conclusions

Having an automated and efficient tool for analyzing dRNA-seq data facilitates the use of the dRNA-seq technique and promotes its application to more sophisticated analysis. For instance, monitoring the plasticity and dynamics of the transcriptomal architecture triggered by different stimuli and growth conditions becomes possible.

The main asset of a novel tool for dRNA-seq analysis that reaches out to a broad user community is usability. As such, we provide TSSAR both as intuitive RESTful Web service [http://rna.tbi.univie.ac.at/TSSAR] together with a set of post-processing and analysis tools, as well as a stand-alone version for use in high-throughput dRNA-seq data analysis pipelines.

Keywords

Differential RNA sequencing, dRNA-seq, TSS, Transcription start site annotation, Transcriptome, RESTful Web service, Next generation sequencing

I recently had to install IBM’s Tivoli Storage Manager (TSM) client under Debian 7. The task seemed straightforward: Download and install the binaries from IBM’s Website and you’re done. The problem here is that IBM does not provide pre-built debian packages, they just offer rpms of the TSM client. One could of course start to fiddle around with rpm2cpio, but I’m not comfortable with that. I’d rather use these pre-built binary debian packages (based on TSM client 6.4.0).

1
2
3
mkdir -p ~/Download/TSM
cd ~/Download/TSM
wget http://www.univie.ac.at/vsi/backup/tsm-ubuntu.tar.gz

Those .deb packages have an awkward dependency, the i386 version of libstdc++5. If your machine is based on an x86_64 architecture, you might want to add support for i386 architectures by issuing:

1
dpkg --add-architecture i386

The remaining installation procedure is straightforward:

1
2
sudo apt-get install libstdc++5:i386 ksh
ls -l | awk '/64/||/BA-/||/BAc/ {print $9}' | xargs -I %s sudo dpkg -i %s

The TSM client executable is installed in /opt/tivoli/tsm/client/ba/bin. You might want to copy two config files and edit them to fit your requirements.

1
2
3
cd /opt/tivoli/tsm/client/ba/bin/
cp dsm.sys.smp dsm.sys
cp dsm.opt.smp dsm.opt

My dsm.sys and dsm.opt look like this:

See also the German version of the Linux TSM client documentation at the University of Vienna Computer Center’s website.

Raw next-generation-sequencing (NGS) data is often shipped in unaligned BAM format, whereas most mappers expect input data in FASTQ format. I’ve written a little bash script that does the preprocessing job for paired-end data: Convert raw reads to FASTQ with bam2fastq, trim adapters with cutadapt and perform quality control checks with FastQC.