Tag Archives: genetics

Programming Languages of Bioinformatics

About every programming language has the potential to be used in bioinformatics. However, certain languages serve special functions and some are more widely used than others. For example, SQL is commonly used in databases and information retrieval while Python and Perl are scripting languages used to process biological data.

The most popular scripting languages of bioinformatics are Perl, Python, Java, C, and C++. According to a comparison of the aforementioned programs in running bioinformatics algorithms such as BLAST, C and C++ demonstrated to be fastest and used up the least memory [2]. Despite their efficiency, these languages contained much more lines of code and are not as flexible as other languages such as Java, Perl, or Python.

Perl is the most established language in bioinformatics and is the language of BioPerl — a collection of Perl modules used for bioinformatics applications and has played a significant part in the Human Genome Project [5,6]. However, due to the amount of programmers adding new features to Perl, it is sometimes an unclear language.

Python is the easiest language to code but is much slower than its contemporaries and many computer scientists criticize Python for teaching beginning programmers bad habits. Like Perl, Python is extremely flexible and has its collection of python modules for bioinformatics — Biopython [1]. Python, however, is a lot more up-to-date than Perl which has been recently become somewhat outdated.

biojavaJava is a good computing language for beginners as it is a very structured language. BioJava is a collection of Java modules for bioinformatics programs and is currently the largest collection of programming tools for bioinformatics [4]. However, despite Java’s speed and popularity, it isn’t as flexible as Python or Perl. Regardless, Java is definitely one of the best starting languages for amateur bioinformatics researchers.

One of the most important languages of bioinformatics is R, which is a multi-paradigm language used in statistics and statistics-related graphics. Bioconductor is an opensource bioinformatics program useful in analyzing genomic information gathered from wet labs and is based on R [3].

In addition, there are several other bioinformatics software modules besides BioJava and BioPerl which are used to perform standard research tasks using their respective programming languages; these include BioPHP, BioRuby, BioRails, etc.

As for data management and databases, SQL remains the best language.

[1] Cock PJA et al. (2009). Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 5(11):1422-1423.

[2] Forment M & Gillings MR (2008). A comparison of common programming languages used in bioinformatics. BMC Bioinformatics. 9:82.

[3] Getleman RC et al. (2004). Bioconductor: open software development for computational biology and bioinformatics. Genome Biology. 5:R80.

[4] Holland RC et al. (2008). BioJava: an open-source framework for bioinformatics. 24(18):2096-7.

[5] Stajich JE et al. (2002). The Bioperl toolkit: Perl modules for the life sciences. 12(10):1611-8.

[6] Stein L (1996). How Perl saved the human genome project. The Perl Journal. 1(2).

External Sites
biojava.org The official wiki of BioJava.
bioperl.org The official wiki of BioPerl.
biopython.org The official wiki of Biopython.



Filed under Bioinformatics

Exploring Bioinformatics: A Project-Based Approach by Caroline St. Clair and Jonathan Visick

Exploring Bioinformatics: A Project-Based Approach by Caroline St. Clair and Jonathan Visick is the first bioinformatics book I studied.

expbioinfAt the time, I was very new to the field of bioinformatics (being proficient at some computer programming and knowledgeable in biology) and found the book to be relatively easy to understand. The book is aimed primarily towards undergraduate students at an introductory level of bioinformatics but due to its simplicity, I would recommend it for self-taught undergraduates, autodidacts, or high schoolers who are interested in pursing studies in bioinformatics.

The book is divided into chapters of main topics and techniques necessary in computational genetics and bioinformatics. Each chapter has do-it-yourself type problems utilizing the basic bioinformatics algorithms and programs which are explained within the chapter. At the end of the book is a glossary of important bioinformatics jargon that all bioinformatics enthusiasts must familiarize themselves with.

One of the downsides of this book is that it is written in Perl, albeit Perl being one of the main languages of bioinformatics. If the reader is not familiar with Perl or another computing language, there is an index in the back with the basic computing commands as well as a tutorial near the end of the book on basic Perl programming.

Currently, I have not been successful in locating a free PDF version of the book, so if you are interested in buying it, I would recommend a used copy from Amazon. However, due to its cost, I would recommend bioinformatics enthusiasts to use the Rosalind website (see more here).

External Sites

biology.jbpub.com/bioinformatics/ is the book’s official website with instructor/student resources and solutions.

1 Comment

Filed under Bioinformatics, Journals and Publications, Research

Bioinformatics and Healthcare

In the future of medicine, there exists a universal bioinformatics-based healthcare system in which physicians are knowledgeable in computer science technology and patient records (for example, genomes and proteomes) are stored on huge online data warehouses. Bioinformatics, more specifically translational bioinformatics, is the missing link between futuristic healthcare and modern computational technology [4]. With these electronic databases for individuals, doctors and other healthcare specialists may work more closely with pharmaceutical companies and researchers in not only detecting and curing genetic disorders but also creating personalized medicine.

bioinf1Most importantly, this type of healthcare system leads to one of the holy grails of healthcare – personalized medicine [2,3,7]. Due to a quicker and faster understanding of individual genetic information, physician-scientists can work better with nearby pharmaceutical companies to create drugs and other forms of biomedical medication to aid individual patients.

Essentially, the process of individual genome sequencing and analysis will become relatively low-cost and quick in the future [1]. By then, our understanding of the human genome and its corresponding proteome will increase tremendously as well (alongside our technological advancements). Of course, the genomic analysis will require computational methods and programs due to the enormity and complexity of the human genome. All of the data storage as well as genomic analysis and comparisons would be made possible using either government-mandated or privately-controlled online databases (with private genomic information secured as this would be as important as one’s social security). As genes govern our daily life, genomic medicine is definitely the way of the future and bioinformatics is the answer to our understanding and analysis of genomic healthcare [5].

By harnessing the power of computational biology and genetic information, doctors can detect malignant irregularities in a patient’s genome and run this information against other universally available databases of disorders to gauge whether or not the patient is at risk for a genetic disorder or cancer. This provides an extremely early detection of cancer or other congenital disorders that may affect a patient later on in his/her life and, coupled with future advancements in drug discovery or genetic treatment, patients can be treated earlier.

For example, a patient is born with a congenital cerebral Arteriovenous malformation (AVM) which is often undetected at birth due to its rarity and size. The child’s genome is sequenced and analyzed in about a day for ten dollars (futuristically) and the physician discovers (through genomic comparison which showed that the SMAD4 gene has a mutation that is common in patients with cerebral AVM) that the child has a malformation in his/her cerebellum which can be treated for in its early stages – the most ideal time to treat AVMs.

Of course with the rise in translational bioinformatics come legal disputes over ethics (bioethics), privacy, property rights, social security, etc [6]. These I will be discussing in the future.

If you have any questions feel free to comment or contact me directly.

[1] Bonetta L (2006). Genome sequencing in the fast lane. Nature. 3:141-147.

[2] Fernald GH, Capriotti E, Daneshjou R, Karczewski KJ, Altman RB (2011). Bioinformatics challenges for personalized medicine. Bioinformatics. 27(13):1741-1748.

[3] Khandekar PS (n.d.). Role of Bioinformatics in Medical Informatics A Case Study: Tuberculosis. Guest Lecture.

[4] Lopez-Alonso V, Mayer MA, Shublaq N (2012). Bridging the Gap between Bioinformatics and Medical Informatics. 24th International Conference of the European Federation for Medical Informatics.

[5] Maojo V & Kulikowski CA (2003). Bioinformatics and Medical Informatics: Collaborations on the Road to Genomic Medicine?. Journal of the American Medical Informatics Association. 10:515-22.

[6] Sethi P (2009). Translational Bioinformatics and Healthcare Informatics: Computational and Ethnical Challenges. Perspectives in Health Information Management. 6(Fall).

[7] Wang X & Liotta L (2011). Clinical bioinformatics: a new emerging science. Journal of Clinical Bioinformatics. 1:1.

External Sites
American Medical Informatics Association is the major organization for translational bioinformatics (in the US).

1 Comment

Filed under Bioinformatics

The Sub-fields of Computational Biology


Ever since its official conception in the 1970s, bioinformatics, the excellent combination of computer science and biology, has come a long way [4]. From this interdisciplinary field sprang new fields of theoretical biology that we know of today [2].

However, bioinformatics is often confused with the now-broader field of computational biology.

As bioinformatics and computational biology grew from genomic research in the 1970s, the terms have been used interchangeably and (still) cause some degree of confusion — particularly among people unfamiliar with the fields. In 2000, the NIH Biomedical Information Science and Technology Initiative Consortium clarified the two by defining the fields as such [3]:

Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.

Computational Biology: The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems.


Bioinformatics: This is the most well-known field of computational biology. This field deals with the development and creation of databases or other methods of storing, retrieving, and analyzing biological data (originally starting with genes) through mathematical and computing algorithms. Bioinformatics employs both mathematics and an ever-increasing variety of computing languages to ease the storage and analysis of biological data. Databases themselves have made way for sprouting research fields such as data mining.

Computational Biology: Computational biology has become a broad term that refers to the application of mathematical models, computing algorithms and programs, and simulation tools to aid in various biological research such as genetics, molecular biology, biochemistry, ecology, and neuroscience among many others. Computational biology research encompasses many disciplines such as health informatics, comparative genomics and proteomics, protein modelling, neuroscience, etc.

Mathematical Biology: This field is an amalgamation of biology and a various fields of mathematics. Often times, some computational biology topics are more math-based (computing) than computer science-based. Various mathematics used in mathematical biology research include discrete mathematics, topology (also useful for computational modeling), Bayesian statistics (such as for biostatistics), Linear Algebra, Logic, Boolean algebra, and many other higher level mathematics. This field is also often called theoretical biology due to its focus on equations, algorithms, and theoretical models.

Systems Biology: This field deals with the interactions between various biological systems ranging from the cellular level to entire populations with the goal of discovering emergent properties. Systems biology usually involves networking cell signalling or metabolic pathways [1]. Systems biology often employs computational techniques and biological modelling to study these complex interactions at cellular levels.

If you have any questions or suggestions, feel free to comment or contact me!


[1] Bu Z & Callaway DJ (2011). Proteins move! Protein dynamics and long-range allostery in cell signaling. Advances in protein chemistry and structural biology. 83:163-221.

[2] Hogewag P (2011). The Roots of Bioinformatics in Theoretical Biology. PLoS Computational Biology. 7(3):e1002021.

[3] Huerta M et al. (2000). NIH Working Definition of Bioinformatics and Computational Biology. Biomedical Information Science and Technology Initiative.

[4] Johnson G & Wu TT (2000). Kabat Database and its applications: 30 years after the first variability plot. Nucleic Acids Research. 28(1)214-218.

External Sites
bioinformatics.org this site contains more information on the Bioinformatics Organization.


Filed under Bioinformatics

Hello, World!

Welcome to my blog on bioinformatics, computational biology, and other related fields. I will be discussing topics in bioinformatics such as the future of bioinformatics and healthcare, computational techniques, and current research.


Filed under Updates