Today, I was in hurry and I had to calculate RPKM values from RNA-Seq datasets. I performed the analysis using subread package, here is the summary of process for all fellow colleagues in similar situation. 

Download and installation of package

Download a Subread binary distribution that suits your operating system from SourceForge website. Binaries for both 64-bit and 32-bit machines with any major operating system is available for download or it can be installed from source. Following code will focus on machines having Ubuntu OS for other's one can directly consult from subread website.

  1. Uncompress the downloaded files using " tar zxvf subread-1.x.x.tar.gz"
  2. Enter the src subdirectory of the package and type "make -f Makefile.Linux"

A new subdirectory called bin will be created under the home directory of the package, and the executables generated from the build will be saved to that subdirectory. To enable easy access add the path to them to your search path (~/.bash_profile OR ~/.profile)

A quick start

  1. Indexing desired genome: $PATH/subread-1.6.2-Linux-x86_64/bin/subread-buildindex -o Genome $PATH/genome.fa
  2. Mapping paired end data RNA-Seq dataset: ../subread-1.6.2-Linux-x86_64/bin/subread-align -i Genome -r R1.fastq -R R2.fastq -t 0 -o Output.bam -T (Number of threads: Default 1); single end RNA-Seq dataset: ../subread-1.6.2-Linux-x86_64/bin/subread-align -i Genome -r R1.fastq -t 0 -o subread_results.bam
  3. Assign mapped RNA-seq reads to desired genome using inbuilt annotation: ../subread-1.6.2-Linux-x86_64/bin/featureCounts -a $PATH/genes.gtf -o Summary_Counts.txt *.bam -T 8

RPKM calculations

Prepare the output file "Summary_Counts.txt" for input in RPKM calculation script.

  1. Open the file in excel.
  2. Delete first line "# Program:featureCounts v1.6.2; Command..."
  3. Place "#" sign in first line before "Geneid"
  4. Delete columns 2,3,4 and 5 named "Chr Start End Strand"
  5. Cut the length column and paste as last column.
  6. New header should be like "#Geneid Sample1.bam Sample2.bam Sample3.bam Sample4.bam Length"
  7. Save as "Summary_Counts_Input.txt"

Download the RPKM calculation script from "https://github.com/santhilalsubhash/rpkm_rnaseq_count/blob/master/rpkm_script_beta.pl" written by Gridhrahakshi.

Run the script using following command:

perl rpkm_script_beta.pl Summary_Counts_Input.txt 2:5 6 > OUTPUT_RPKM_FILE 

perl rpkm_script_beta.pl Summary_Counts_Input.txt ActualColumnStart:ActualColumnEnd ColumnGeneLength > OUTPUT_RPKM_FILE 


I hope this general guide will help you for read count analysis and RPKM calculation. Kindly, leave your valuable feedback in comments section.

There are a lot of different methods to analyze miRNA-Seq and RNA-Seq data. majority of them involved a lot of installations and human efforts. In order to resolve this issue many different analysis suits and software packages are available online. Today I am going to analyze data using miARma-Seq suit, which was published in nature scientific reports in May 2016 (Original Article). Other than miARma-Seq many other tools and analysis suits are available which are mentioned below:

  1. Tools gene expression analysis, like ExpressionPlot5, GENE-counter6, RobiNA7, TCW8, Grape RNA-Seq9 or MAP-RSeq10
  2. Tools focuses on the analysis of miRNA expression profiles, such as DSAP11, miRanalyzer12, miRExpress13, miRNAkey14, iMir15, CAP-miRSeq16, mirTools 2.017 or sRNAtoolbox18
  3. Tools implemented to perform both RNA-Seq and miRNA-Seq analysis, such as wapRNA19, eRNA20, BioVLAB-MMIA-NGS21 or Omics Pipe22
  4. Methods integrating several software enabling different type of NGS analyses are GALAXY (https://galaxyproject.org/), QuasR23, RAP24, Subread/edgeR25, ViennaNGS26 suite

I found miARma-Seq most convenient and easy to install among all available options, so, I am going to install it and apply it on my datasets. I am starting from completely scratch on Ubuntu OS, using amazon cloud EC2 cloud. First we have to install all pre-requisites but if you are using already working server may be many of them are already installed on the machine.

  • Install make using "sudo apt install make"
  • Install GCC compiler, I installed using following command:

sudo apt-get update && \
sudo apt-get install build-essential software-properties-common -y && \
sudo add-apt-repository ppa:ubuntu-toolchain-r/test -y && \
sudo apt-get update && \
sudo apt-get install gcc-snapshot -y && \
sudo apt-get update && \
sudo apt-get install gcc-6 g++-6 -y && \
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-6 60 --slave /usr/bin/g++ g++ /usr/bin/g++-6 && \
sudo apt-get install gcc-4.8 g++-4.8 -y && \
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.8 60 --slave /usr/bin/g++ g++ /usr/bin/g++-4.8;

  • Test GCC installation by checking its version

gcc -v

GCC Installation

  • Install perl, I installed perl 5.6.1 but you can install the latest available version

wget http://www.cpan.org/src/5.0/perl-5.6.1.tar.gz
tar -zxvf perl-5.6.1.tar.gz
cd perl-5.6.1/
rm -f config.sh Policy.sh
sh Configure -de
make test
make install

  • Install R

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
sudo add-apt-repository 'deb [arch=amd64,i386] https://cran.rstudio.com/bin/linux/ubuntu xenial/'
sudo apt-get update
sudo apt-get install r-base

  • Test R installation

sudo -i R


  • Install JAVA

sudo apt-get update
sudo apt-get install default-jre
sudo apt-get install default-jdk

  • Test JAVA installation

java -version

  • Install Bioconductor packages in R, you need to start R with administrative permissions or use local R but don't forget to add path iof local R in bashrc

sudo R

  • Install miARmaSeq suite 

mkdir NGS
cd NGS
curl -L -O https://bitbucket.org/cbbio/miarma/get/master.tar.bz2
tar -xjf master.tar.bz2
cd cbbio-miARma-*
ls -l

  • Install miARmaSeq Examples

curl -L -O https://sourceforge.net/projects/miarma/files/Examples/Examples_miARma_mRNAs.tar.bz2
tar -xjf Examples_miARma_mRNAs.tar.bz2


  • Test miARma

perl miARma Examples/basic_examples/mRNAs/1.Quality/1.Quality.ini --check
perl miARma Examples/basic_examples/mRNAs/1.Quality/1.Quality.ini

  • Download Genome or just give path of already downloaded genome in .ini files and matching .gtf files. I preferred to download genome from iGenome because they already have indexes of Bowtie1, Bowtie2, BWA and matching annotation files.

wget ftp://igenome:This email address is being protected from spambots. You need JavaScript enabled to view it./Homo_sapiens/UCSC/hg19/Homo_sapiens_UCSC_hg19.tar.gz
tar -zxvf Homo_sapiens_UCSC_hg19.tar.gz


You can find miARma manual on the following link for further details (Manual). You can use Bowtie1, Bowtie2, HiSAT and STAR for mapping using this suite and Edge R + NOISeq for differential expression analysis. It also provides functional annotation.

If you want to change any default commands of miARma e.g. I wanted to use star but because of RAM limitations I was unable to use it, so, I changed the command in /lib/miARma/Aligner.pm. You can modify the default commands, only if you know what you are doing.

I hope you will enjoy your analysis with miARma. Please feel free to comment your feedback.


Latest Articles

09 February 2018
26 January 2018
09 January 2018


© 2018 BioinfoGuide. All Rights Reserved.