Basic Local Alignment Search Tool (BLAST)


BLAST (Basic Local Alignment Search Tool) is a method to ascertain sequence similarity. The program takes a query sequence and searches it against the database selected by user. It aligns a query sequence against the every subject sequence in the database. The results are reported in a form of a ranked list followed by a series of individual sequence alignments, plus various statistics and scores. Every hit in that list is assigned with a similarity score S. Further, that score is analyzed how likely it is to arise by chance. For that purpose so called E-value is calculated for every hit. E-value for the score S tells the expected number of hits of the score S or higher in the database.
For detailed discussion of statistics used in BLAST check the following link.

This program can be accessed directly online on NCBI webserver ( or one can download blast to run in local settings. It also offers an API that can be used in different applications. In this tutorial I am just going to discuss about basic types and option of these BLAST program; majority of the stuff on NCBI website is self explanatory. 


Types of BLAST programs

We can divide the BLAST programs in to two different categories depending on their functionality; first category is general search tools and second category is specialized search tools. First I am discussing general search tools which all have almost similar interface and features.

  1. BLASTP compares an amino acid query sequence against a protein sequence database

  2. BLASTN compares a nucleotide query sequence against a nucleotide sequence database

  3. BLASTX compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database

  4. TBLASTN compares a protein query sequence against a nucleotide sequence database dynamically translated in all six reading frames (both strands)

  5. TBLASTX compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database

Sequence input: BLAST accept the sequence in FASTA format (widely used file format starts with >Sequence_Definition and from new line sequence with 60 characters in one line) or Accession Number (GI number).

Subject databases: There are many databases to use as subject databases. One of the most commonly used is nr database: collection of "non-redundant" sequences from GenBank and other sequence databanks. There are many other option one can select according to the requirement e.g protein data bank (PDB), RefSeq Genome Database (RefSeq Genomes) etc.

FILTER (Low-complexity): Mask off segments of the query sequence that have low compositional complexity (i.e. regions of biased composition, such as short-period repeats)


Understanding BLAST results: BLAST result are self-explanatory with some key terms that will help you to understand them in a better way.

Query Coverage: Query coverage should be maximum, it shows that how much of your query sequence is binding to traget sequence with accuaracy. If 10% of query is binding 100% to target sequence, these results will not be considered as good.

EXPECT value: The statistical significance threshold for reporting matches against database sequences; the default value is 10, such that 10 matches are expected to be found merely by chance. If the statistical significance ascribed to a match is greater than the EXPECT threshold, the match will not be reported. Increasing the EXPECT value forces the program to report less isgnificant matches.

Identity: This value shows the similarity between query and target sequence.
In short blast results should be considered after checking all three necessary values.


Second category of BLAST tools contains specialized search tools, which are mention below:

  1. SmartBLAST: It can be used for searching highly similar proteins to query sequence
  2. Primer_BLAST: It is one of the most important tool used for designing primers specific to any PCR template
  3. GlobalAlign: Its an implementation of Needleman_Wunsch alogrithm used for global alignment of two sequences accross their entire span
  4. CD-Search: It is used for finding conserved domains in any particular sequence which are important in evolutionary genomics, motif prediction etc.
  5. GEO: It have capability to find matches to gene expression profiles. This is performed by searching against Gene Expression Omnibus (GEO) database
  6. IgBLAST: It is used for searching immunoglobulins and T-Cell receptor sequences, widely used in the field of immunology
  7. VecScreen: It is used for searching sequences for vector contamination. Vector contamination can cause problems in any kind of analysis, so, it is necessary to remove all kind of vector sequences from target query before performing further alignment or analysis
  8. CDART: This tool can find sequences with similar conserved domain architecture, have a lot of usuage in the field of proteomics, evolutionary genomics etc.
  9. TargetedLoci: Again a golden tool for evolutionary biology having a capability of searching markers for phylogenetic analysis
  10. Multiple Alignment: It is used for multiple alignment of sequences using domain and protein constraints
  11. BioAssay: This tool can be used for searching protein or nucleotide targets in PubChem BioAssay; a large public repository for small-molecule and RNAi screening data since 2004 providing open access of its data content to the community.
  12. MOLE-BLAST: Classify multiple query sequences and discover their relationship to each other. This tool provides a taxonomic context for the queries. It is intended to work with a specific locus from a set of organisms rather than sequences like the entire genome of an organism or unannotated contigs.
Tags: , , , , , , , ,