Tag Archives: python

Programming Languages of Bioinformatics

About every programming language has the potential to be used in bioinformatics. However, certain languages serve special functions and some are more widely used than others. For example, SQL is commonly used in databases and information retrieval while Python and Perl are scripting languages used to process biological data.

The most popular scripting languages of bioinformatics are Perl, Python, Java, C, and C++. According to a comparison of the aforementioned programs in running bioinformatics algorithms such as BLAST, C and C++ demonstrated to be fastest and used up the least memory [2]. Despite their efficiency, these languages contained much more lines of code and are not as flexible as other languages such as Java, Perl, or Python.

Perl is the most established language in bioinformatics and is the language of BioPerl — a collection of Perl modules used for bioinformatics applications and has played a significant part in the Human Genome Project [5,6]. However, due to the amount of programmers adding new features to Perl, it is sometimes an unclear language.

Python is the easiest language to code but is much slower than its contemporaries and many computer scientists criticize Python for teaching beginning programmers bad habits. Like Perl, Python is extremely flexible and has its collection of python modules for bioinformatics — Biopython [1]. Python, however, is a lot more up-to-date than Perl which has been recently become somewhat outdated.

biojavaJava is a good computing language for beginners as it is a very structured language. BioJava is a collection of Java modules for bioinformatics programs and is currently the largest collection of programming tools for bioinformatics [4]. However, despite Java’s speed and popularity, it isn’t as flexible as Python or Perl. Regardless, Java is definitely one of the best starting languages for amateur bioinformatics researchers.

One of the most important languages of bioinformatics is R, which is a multi-paradigm language used in statistics and statistics-related graphics. Bioconductor is an opensource bioinformatics program useful in analyzing genomic information gathered from wet labs and is based on R [3].

In addition, there are several other bioinformatics software modules besides BioJava and BioPerl which are used to perform standard research tasks using their respective programming languages; these include BioPHP, BioRuby, BioRails, etc.

As for data management and databases, SQL remains the best language.

[1] Cock PJA et al. (2009). Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 5(11):1422-1423.

[2] Forment M & Gillings MR (2008). A comparison of common programming languages used in bioinformatics. BMC Bioinformatics. 9:82.

[3] Getleman RC et al. (2004). Bioconductor: open software development for computational biology and bioinformatics. Genome Biology. 5:R80.

[4] Holland RC et al. (2008). BioJava: an open-source framework for bioinformatics. 24(18):2096-7.

[5] Stajich JE et al. (2002). The Bioperl toolkit: Perl modules for the life sciences. 12(10):1611-8.

[6] Stein L (1996). How Perl saved the human genome project. The Perl Journal. 1(2).

External Sites
biojava.org The official wiki of BioJava.
bioperl.org The official wiki of BioPerl.
biopython.org The official wiki of Biopython.



Filed under Bioinformatics

Learn Bioinformatics: Rosalind

Rosalind (named after Rosalind Franklin, the scientist whose work established the structure of DNA) is an education site aimed to freely teach anyone bioinformatics.

The language Rosalind uses is Python and may be a setback for those without programming experience as it assumes the user is acquainted with Python. However, anyone experienced in other computing languages such as Java can pick up Python with relative ease. For those who would like to learn Python, there are many online tutorials.

In addition to teaching bioinformatics, the website also has instructional problems on computational algorithms and understanding bioinformatics website such as GenBank.

Although the primary language of the website is Python, I like to practice my other computing languages — particularly Java and Perl.

External Links
rosalind.info is the official Rosalind website
python.org is the official Python site
learnpython.org is an online tutorial to learning python (does not require prior programming experience)

1 Comment

Filed under Bioinformatics