Dynamic time warping (DTW) algorithm is used for estimation of similarities between two sequences which may fluctate in time or speed. For example, using DTW we can easily detect the walking patterns of one person who was walking with different speeds in different videos, even if there were accelerations or decelerations during the course of one's observations. DTW has variety of practical applications and used for analysis of any data that can be represented in linear fashion including audio, video and graphics. The best example of DTW is coping with different speaking speeds in automatic speech recognization.

Finding recurring patterns in process data, also referred to as motif-matching, may reveal diagnostic information to engineers and operators. Dynamic Time Warping (DTW) is one of the most widely used techniques for performing these motif matches with certain restrictions. The sequences are "warped" non-linearly in the time dimension to determine a measure of their similarity independent of certain non-linear variations in the time dimension. This sequence alignment method is often used in the context of hidden Markov models.

One example of the restrictions imposed on the matching of the sequences is on the monotonicity of the mapping in the time dimension. Continuity is less important in DTW than in other pattern matching algorithms; DTW is an algorithm particularly suited to matching sequences with missing information, provided there are long enough segments for matching to occur. The sample implementation of the algorithm is available on wikipedia.

Classification methods of DNA most commonly use comparison of the differences in DNA symbolic records, which requires the global multiple sequence alignment. This solution is often inappropriate, causing a number of imprecisions and requires additional user intervention for exact alignment of the similar segments. The similar segments in DNA represented as a signal are characterized by a similar shape of the curve. The DNA alignment in genomic signals may adjust whole sections not only individual symbols. The dynamic time warping (DTW) is suitable for this purpose and can replace the multiple alignment of symbolic sequences in applications, such as phylogenetic analysis (Classification of genomic signals using dynamic time warping;
Skutkova et al., 2013).

Classification of genomic signals using dynamic time warping is an adequate variant to phylogenetic analysis using the symbolic DNA sequences alignment; in addition, it is robust, quick and more precise technique.

Next generation sequencing (NGS) or second generation sequencing is the massively parallel sequencing in any of the high-throughput approaches to DNA sequencing. These technologies started to emerge in late 1990's and are commercially available since 2005. New NGS platforms have capability to generate 1 million to 43 billion short reads in a single run. A number of different NGS platform are available commercially with a unique sequencing chemistry and engineering configurations. Major NGS platforms with their basic features is shown in table below (Source Wikipedia).


Platform Template Preparation Chemistry Max Read length Run Times (days) Max Gb per Run
Roche 454
GS FLX Titanium Clonal-emPCR Pyrosequencing 400‡ 0.42 0.035
Illumina MiSeq Clonal Bridge Amplification
Reversible Dye Terminator
Illumina HiSeq Clonal Bridge Amplification
Reversible Dye Terminator
Illumina Genome Analyzer IIX Clonal Bridge Amplification
Reversible Dye Terminator
Life Technologies SOLiD4 Clonal-emPCR
Oligonucleotide 8-mer Chained Ligation
Life Technologies Ion Proton Clonal-emPCR
Native dNTPs, proton detection 200 0.5 100
Complete Genomics Gridded DNA-nanoballs Oligonucleotide 9-mer Unchained Ligation 7x10 11 3000
Helicos Biosciences Heliscope Single Molecule Reversible Dye Terminator 35‡ 8  25
Pacific Biosciences SMRT Single Molecule Phospholinked Fluorescent Nucleotides 10,000 (N50); 30,000+ (max) 0.08 0.5



NGS sequencing is significantly cheaper and quicker with less amount of DNA required for an accurate and reliable sequencing. The majority of processes are automated which means less labor cost and reagents required for NGS are also less as compared to first generation Sanger sequencing.

The first major and commonly asked question by everyone is how can I start or which should be the starting point? The answer to this question is really dependent on the tasks you want to perform. There is no way that one can excel in all aspects of bioinformatics in just couple of months or years. Its better to decide what you want to excel before starting, we can divide the expertise in to different categories:

1. Ability to use online tools and web-servers: Now major web-servers are developed to the extent that an ordinary biologist is able to run any kind of automated analysis on these servers easily. For example, performing pairwise alignment of any genomic or proteomic sequence using blast or blat, searching common mutations in particular cell lines using public databases, searching for common variants in any particular population etc. If you want to excel these basic skills in bioinformatics that its really easy now-a-days. All major platforms have very elaborative user manuals and interfaces, one can excel one platform just in couple of days. Just keep on visiting Bioinfoguide.com as I have a plan to cover all major platforms one by one.

2. Ability to install and set-up your own pipelines using preexisting tools: The answer to this part depends on your previous skills, if you are familiar with any Unix based system than you are lucky, as majority of preexisting tools are using Unix platform. If you are using Windows than first you should start with getting familiar with basic Unix environment and commands. I will recommend install Ubuntu as a second OS in your system (or make a virtual environment but it will slow your learning) right now, there are a lot of great tutorial available online for this purpose. Once you get familiar with Unix system than majority of tools and pipelines don't need high scale skills of bioinformatics atleast to run o default parameters. Just keep an eye on all tutorials using our site I have a plan to make tutorials for all basic NGS tools and pipelines.

3. Ability of data manipulation (Important for all kind of analysis, modifying large files using own scripts): Here starts real bioinformatics, if you want to excel in bioinformatics analysis than you should have basic skill for data manipulation, any scripting language and databases. Usually data related to bioinformatics is big data, you have to perform different kind of analysis on large files containing genomic sequences or related information. Even if you are just using preexisting tools you can perform tasks to some extent without automation but after a particular threshold you need to automate all these processes and you will need basic programming skills for this purpose. If you want to perform all analysis on your own that just start some basic scripting language courses e.g. Perl, Python or Ruby etc. I will try my level best to share all my hacks and techniques using this platform, if you need any special assistance in this regard feel free to contact me.

4. Ability to design your own new tools and algorithms: If you wants to be professional level bioinformatician than you must have good grip on any programming language, scripting languages, databases  for implementation of your ideas and also skills of statistics and mathematics which will help you to design your  own algorithm. Usually, this level is dependent on many other factors and you will have to join any professional team after achieving basic skills. Major limitations include data availability for designing any tool or algorithm, followed by computational facilities, experience from different fields etc.



Bioinformatics is an interdisciplinary field that combines biology, computer science and mathematics to analyze and interpret biological data. As an interdisciplinary field its borders can’t be fashioned easily, there are a lot of different definitions provided by pioneers in this field. There are also many different fields that overlap with bioinformatics substantially e.g. biomedical engineering, systems biology, computation biology etc. All these fields overlap each other very frequently and researchers generate their own definitions depending on main goals. If we look on different definitions of bioinformatics by experts it varies depending on his domain of research. I have summarized some major definitions below:

A tight definition of bioinformatics is provided by Fred J Tekaia at the Institute Pasteur offers: "The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information".

Another loose definition of bioinformatics is defined by Richard Durbin, Head of Informatics at the Welcome Trust Sanger Institute: "I do not think all biological computing is Bio-Informatics, e.g. mathematical modeling is not Bio-Informatics, even when connected with biology-related problems. In my opinion, Bio-Informatics has to do with management and the subsequent use of biological information, particular genetic information."

The NIH Biomedical Information Science and Technology Initiative Consortium agreed on the following definitions of Bio-Informatics and computational biology recognizing that no definition could completely eliminate overlap with other activities or preclude variations in interpretation by different individuals and organizations.

Bio-Informatics is research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.

Computational Biology is the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems.

National Center for Biotechnology Information (NCBI 2001) defines bioinformatics as: "Bio-Informatics is the field of science in which biology, computer science, and information technology merge into a single discipline. There are three important sub-disciplines within bio-informatics:

The development of new algorithms and statistics with which to assess relationships among members of large data sets

The analysis and interpretation of various types of data including nucleotide and amino acid sequences, protein domains, and protein structures

The development and implementation of tools that enable efficient access and management of different types of information"

Finally Oxford dictionary defined bioinformatics as “The science of collecting and analyzing complex biological data such as genetic codes.”

Each definition of bioinformatics is as good as the other. This is just the nature of the beast. Please feel free to comment your thoughts on the topic.



© 2018 BioinfoGuide. All Rights Reserved.