Tag Archives: java

Programming Languages of Bioinformatics

About every programming language has the potential to be used in bioinformatics. However, certain languages serve special functions and some are more widely used than others. For example, SQL is commonly used in databases and information retrieval while Python and Perl are scripting languages used to process biological data.

The most popular scripting languages of bioinformatics are Perl, Python, Java, C, and C++. According to a comparison of the aforementioned programs in running bioinformatics algorithms such as BLAST, C and C++ demonstrated to be fastest and used up the least memory [2]. Despite their efficiency, these languages contained much more lines of code and are not as flexible as other languages such as Java, Perl, or Python.

Perl is the most established language in bioinformatics and is the language of BioPerl — a collection of Perl modules used for bioinformatics applications and has played a significant part in the Human Genome Project [5,6]. However, due to the amount of programmers adding new features to Perl, it is sometimes an unclear language.

Python is the easiest language to code but is much slower than its contemporaries and many computer scientists criticize Python for teaching beginning programmers bad habits. Like Perl, Python is extremely flexible and has its collection of python modules for bioinformatics — Biopython [1]. Python, however, is a lot more up-to-date than Perl which has been recently become somewhat outdated.

biojavaJava is a good computing language for beginners as it is a very structured language. BioJava is a collection of Java modules for bioinformatics programs and is currently the largest collection of programming tools for bioinformatics [4]. However, despite Java’s speed and popularity, it isn’t as flexible as Python or Perl. Regardless, Java is definitely one of the best starting languages for amateur bioinformatics researchers.

One of the most important languages of bioinformatics is R, which is a multi-paradigm language used in statistics and statistics-related graphics. Bioconductor is an opensource bioinformatics program useful in analyzing genomic information gathered from wet labs and is based on R [3].

In addition, there are several other bioinformatics software modules besides BioJava and BioPerl which are used to perform standard research tasks using their respective programming languages; these include BioPHP, BioRuby, BioRails, etc.

As for data management and databases, SQL remains the best language.

[1] Cock PJA et al. (2009). Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 5(11):1422-1423.

[2] Forment M & Gillings MR (2008). A comparison of common programming languages used in bioinformatics. BMC Bioinformatics. 9:82.

[3] Getleman RC et al. (2004). Bioconductor: open software development for computational biology and bioinformatics. Genome Biology. 5:R80.

[4] Holland RC et al. (2008). BioJava: an open-source framework for bioinformatics. 24(18):2096-7.

[5] Stajich JE et al. (2002). The Bioperl toolkit: Perl modules for the life sciences. 12(10):1611-8.

[6] Stein L (1996). How Perl saved the human genome project. The Perl Journal. 1(2).

External Sites
biojava.org The official wiki of BioJava.
bioperl.org The official wiki of BioPerl.
biopython.org The official wiki of Biopython.


Filed under Bioinformatics