File reading is an integral part of bioinformatics. It is not possible to search relevant information in large files. There are many ways to read a file in different programming languages. Usually, file reading in bioinformatics is performed using scripting languages like Perl, Python, Ruby etc rather than hardcore languages like Java, .Net etc. Scripting languages have prevalence because they are easy and very flexible without a large set of prerequisites. I prefer to use Perl for my basic tasks. In this section I am going to share some basic codes that I use for data parsing. Majority of the codes are in perl, if someone wants to try they are really easy. One can download the PERL, according to the OS they are using https://www.perl.org/get.html. One can write any of the Perl code in simple notepad and can save the file with .pl extension rather than .txt. For ease of coding and opening large files I will recommend to use Notepad++ (https://notepad-plus-plus.org/download/v7.7.1.html) rather than simple notepad. 

I will start with reading a simple few lines of VCF file. For example, one wants to read the following VCF file using perl. Can be downloaded directly from here (Test.vcf)


Save the following lines in notepad++ and save as Test.vcf



chrM 146 . T C 6797.77 . AC=2;AF=1.00;AN=2;DP=134;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=34.24;SOR=0.886 GT:AD:DP:GQ:PL 1/1:0,134:134:99:6826,469,0
chrM 150 . T C 6657.77 . AC=2;AF=1.00;AN=2;DP=130;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=30.63;SOR=1.127 GT:AD:DP:GQ:PL 1/1:0,130:130:99:6686,422,0
chrM 152 . T C 5780.77 . AC=2;AF=1.00;AN=2;BaseQRankSum=-0.183;ClippingRankSum=0.000;DP=131;ExcessHet=3.0103;FS=3.849;MLEAC=2;MLEAF=1.00;MQ=60.00;MQRankSum=0.000;QD=29.09;ReadPosRankSum=1.740;SOR=0.849 GT:AD:DP:GQ:PL 1/1:1,130:131:99:5809,353,0
chrM 195 . C T 4560.77 . AC=2;AF=1.00;AN=2;BaseQRankSum=3.317;ClippingRankSum=0.000;DP=117;ExcessHet=3.0103;FS=2.994;MLEAC=2;MLEAF=1.00;MQ=60.00;MQRankSum=0.000;QD=32.93;ReadPosRankSum=2.438;SOR=0.150 GT:AD:DP:GQ:PL 1/1:2,115:117:99:4589,282,0
chrM 302 . AC A 706.73 . AC=2;AF=1.00;AN=2;BaseQRankSum=-0.764;ClippingRankSum=0.000;DP=137;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQRankSum=0.000;QD=25.24;ReadPosRankSum=2.234;SOR=4.407 GT:AD:DP:GQ:PL 1/1:2,26:28:32:744,32,0
chrM 410 . A T 5991.77 . AC=2;AF=1.00;AN=2;DP=148;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=30.05;SOR=5.283 GT:AD:DP:GQ:PL 1/1:0,148:148:99:6020,445,0
chrM 495 . AC A 1435.73 . AC=2;AF=1.00;AN=2;DP=241;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=31.91;SOR=1.714 GT:AD:DP:GQ:PL 1/1:0,45:45:99:1473,137,0




Perl code to open the file. Save the following lines in notepad++ and save as Test_1.pl

open(in,"Test.vcf"); #in is a variable and any word or alphabet can be used; Test.vcf, if the file is in same directory; if they are in any other directory, please give full path.

while($line = <in>) #while is a key word to tell program that do the action in brackets until there are lines in file


print $line; #Print whatever read in the cmd screen

sleep(2); #Sleep is to stop computer from doing anything for a duration mentioned in brackets, this is to slow the program otherwise you will see nothing, because computers are really fast in this kind of tasks.



Perl Code file can be downloaded from here (Test_1.pl). In next sections I will share my codes for data manipulation. After reading the data we can do anything we want to the data and save them in any format we want. If you have any questions or queries please let me know using contact form or can leave a comment below. 



Your search resulted in 3 genes

Searching for ERBB2

Synonyms of ERBB2 are : CD340, HER-2, HER-2/neu, HER2, MLN 19, NEU, NGL, TKR1

Searching for ERBB3

Synonyms of ERBB3 are : ErbB-3, HER3, LCCS2, MDA-BF-1, c-erbB-3, c-erbB3, erbB3-S, p180-ErbB3, p45-sErbB3, p85-sErbB3

Searching for ERBB4

Synonyms of ERBB4 are : ALS19, HER4, p180erbB4

Array ( [0] => A1BG )


Here new code


Gene ID Symbol Synonyms Location Name Type
Gene ID Symbol Synonyms Location Name Type


Latest Articles


© 2018 BioinfoGuide. All Rights Reserved.