Subread package for NGS read count analysis

Today, I was in hurry and I had to calculate RPKM values from RNA-Seq datasets. I performed the analysis using subread package, here is the summary of process for all fellow colleagues in similar situation. 

Download and installation of package

Download a Subread binary distribution that suits your operating system from SourceForge website. Binaries for both 64-bit and 32-bit machines with any major operating system is available for download or it can be installed from source. Following code will focus on machines having Ubuntu OS for other's one can directly consult from subread website.

  1. Uncompress the downloaded files using " tar zxvf subread-1.x.x.tar.gz"
  2. Enter the src subdirectory of the package and type "make -f Makefile.Linux"

A new subdirectory called bin will be created under the home directory of the package, and the executables generated from the build will be saved to that subdirectory. To enable easy access add the path to them to your search path (~/.bash_profile OR ~/.profile)

A quick start

  1. Indexing desired genome: $PATH/subread-1.6.2-Linux-x86_64/bin/subread-buildindex -o Genome $PATH/genome.fa
  2. Mapping paired end data RNA-Seq dataset: ../subread-1.6.2-Linux-x86_64/bin/subread-align -i Genome -r R1.fastq -R R2.fastq -t 0 -o Output.bam -T (Number of threads: Default 1); single end RNA-Seq dataset: ../subread-1.6.2-Linux-x86_64/bin/subread-align -i Genome -r R1.fastq -t 0 -o subread_results.bam
  3. Assign mapped RNA-seq reads to desired genome using inbuilt annotation: ../subread-1.6.2-Linux-x86_64/bin/featureCounts -a $PATH/genes.gtf -o Summary_Counts.txt *.bam -T 8

RPKM calculations

Prepare the output file "Summary_Counts.txt" for input in RPKM calculation script.

  1. Open the file in excel.
  2. Delete first line "# Program:featureCounts v1.6.2; Command..."
  3. Place "#" sign in first line before "Geneid"
  4. Delete columns 2,3,4 and 5 named "Chr Start End Strand"
  5. Cut the length column and paste as last column.
  6. New header should be like "#Geneid Sample1.bam Sample2.bam Sample3.bam Sample4.bam Length"
  7. Save as "Summary_Counts_Input.txt"

Download the RPKM calculation script from "https://github.com/santhilalsubhash/rpkm_rnaseq_count/blob/master/rpkm_script_beta.pl" written by Gridhrahakshi.

Run the script using following command:

perl rpkm_script_beta.pl Summary_Counts_Input.txt 2:5 6 > OUTPUT_RPKM_FILE 

perl rpkm_script_beta.pl Summary_Counts_Input.txt ActualColumnStart:ActualColumnEnd ColumnGeneLength > OUTPUT_RPKM_FILE 


I hope this general guide will help you for read count analysis and RPKM calculation. Kindly, leave your valuable feedback in comments section.

You have no rights to post comments



© 2018 BioinfoGuide. All Rights Reserved.