Tools to text mine PubMed

The major goal of bioinformatics is to assist biologists in their basic tasks. One of the redundant and laborious task is to perform literature review of any problem. Usually, this is performed by searching each key term in pubmed and some times these key terms can be hundreds. Secondly, this approach has many limitations because of the server load and interface. My biologists friends frequently request me to text mine the pubmed for there key terms and give them the data in excel sheet. This will help them in having a birds eye view of the literature about their hypothesis and gene of interests. There are many different tools provided by NCBI to text mine pubmed and the best thing about that is they already have the analyzed data present which can be downloaded easily from relevant website. One just have to parse the data from these files. The complete list of tools and software available for this purpose are provided on the following link. The major tools include:

  • PubTator: I found this tool as the most useful resource to text mine the complete PubMed with key biological entities e.g. diseases and genes. The tools is available here freely. This can be assessed by an API but I prefer to download all the files from the FTP site and parse them for fast and more customized results. I will discuss and share the codes for basic parsing of PubTator files in a next article. An example of what can you do using these resources is available on this website as http://bioinfoguide.com/index.php/tools, this was the basic tool I developed in early 2018 for performing the literature review of some of my genes of interest.

This tool contains all genes and relevant information. If you click on any gene name you will get a list of all genes in which this gene is reported with all the PubMed article till the database was updated (It was early 2018, I have plan to update this database soon). 


  • LitVar: This is the great resource for people working on genetics, it allows retrieval of variant related information from biomedical literature. It links the key biological features of a variant with the genes, diseases and drugs. 

Other than these 2 major tools a large list of tools is available at following link to perform personalized tasks but I believe these 2 tools offers all the basic functionality to perform literature review about any gene, disease, drug or variant which is asked very frequently. 

You have no rights to post comments



© 2018 BioinfoGuide. All Rights Reserved.