Skip to main content
From Genes to Proteins: A New Level of Complexity
Jan 1, 2009

Newspapers frequently run articles reporting a study about a gene linked to some disease. Thanks to such wide media coverage, the word "gene" has become a household term for most of us. And, genetics, the study of genes, probably owes its popularity to a female sheep you are all familiar with: yes, I mean Dolly, the first animal successfully cloned from an adult body cell.

We inherit our hereditary characteristics from our parents. The basic unit responsible for inheritance in our body is the gene. More technically, a gene is a hereditary unit consisting of a sequence of DNA that occupies a specific location on a chromosome and determines a particular characteristic in an organism. Genes are like words on the long string of DNA. The description of the fundamental process of synthesizing proteins from the information on genes is called the "Central Dogma." According to this dogma, DNA is used to synthesize RNA, and in turn, RNA is used to synthesize proteins. Hence, this dogma dictates the link between genes and proteins. Proteins are actually a translated and three-dimensional version of the linear information stored in genes.

The Structure of DNA

DNA (Deoxyribonucleic acid) is our repository of genetic information. Although there are organisms such as RNA viruses that possess RNA (ribonucleic acid) as their genetic material, virtually all other living organisms inherit their genes through DNA. Hence, DNA is vital for the existence and perpetuation of life on Earth.

In a simple comparison, DNA can be likened to a sequence of letters where each letter is a single nucleotide, and the alphabet has only four letters: A, T, C and G. Although this alphabet is extremely small compared to those used in human communication today, we are still capable of capturing the vast size of human DNA with this analogy: Our DNA is composed of a sequence of nearly 3 billion (3,000,000,000) of these letters. What this means is that, if you were to type out your genetic code, you would have a 5,000-volume encyclopedia, with each volume containing 400 pages, and each page having 1,500 letters! But then, how do we even fit this formidable size of information in every single cell of our body? The answer lies in the astonishing folding, packaging and wrapping steps DNA goes through upon synthesis. Positioning nucleotides side by side, each DNA molecule would take up about 6 feet (~2 meters) of space. However, after all the packaging steps, DNA becomes compact enough to fit in not only a cell, but also in the microscopic nucleus of each cell.

Genes and the Human Genome Project

Unfortunate for our alphabet analogy above, the 3 billion nucleotides in DNA do not contain any spaces to let us know where each word begins and ends. The Human Genome Project accomplished the task of unraveling what these 3 billion letters are (each one is one of A,T,C and G) and this was a major achievement of humanity. However, it was not until then that we realized the real challenge DNA posed us: Where were the genes in DNA? In other words, how would we understand the words and sentences in this 3-billion string of letters? Apart from efforts to discover the DNA sequences of other organisms, it is not unfair to say that the interest and workforce once focused on the Human Genome Project has now almost completely shifted to this latter "real" challenge of discovering the genes in DNA.

How we wish life could be that easy! Just as completing the human DNA sequence made us realize that we did not know where the genes are, discovering some genes allowed us to understand that we would still be missing a major part of the picture even if we knew exactly where each gene was. Do we not frequently encounter instances in everyday life where one word means different things depending on context? So, is there any good reason to think that genes on our chromosomes will be any less complex? Unfortunately not. Quite to the contrary, the sense is growing that genes are actually far more complex and intricate than we originally thought. For one thing, a single gene may not cause an immediate effect, but may interact with a network of other genes to produce the final effect. Diseases that are caused by individual genes are actually very few, a famous example being cystic fibrosis. But diseases that are affected by the interaction of many genes are far more numerous and prevalent, for example, breast cancer, Alzheimer's disease, Type 1 diabetes mellitus, multiple sclerosis and obesity.

This latter group of diseases is appropriately called "complex diseases." Efforts are under way to decipher the intricate genetic and protein networks responsible for such diseases; however, there are so many (known and also unknown) variables that biologists have already called for help. Research problems such as complex diseases that require the interaction of biologists, mathematicians, computer scientists and statisticians alike have led to the advent of the currently very popular field of "Systems Biology." Viewing the cell as a large factory, this field aims to understand all molecular networks and interactions that make up the very sophisticated machinery in living systems. After deciphering how cells operate flawlessly as a complex system, humans will be better able to discover causes of diseases, and will also be in a much better position to manipulate cells to cure diseases.

The idea of manipulating cells and cell components such as genes and proteins has actually led to "Synthetic Biology," which is, in essence, the engineering approach to Systems Biology. Synthetic biologists try to engineer gene and protein networks in the cellular machinery to program cells for synthesizing custom-tailored molecules. This can be in the form of redesigning or producing mass amounts of existing molecules, or synthesizing nonexistent molecules that have medical or other potential uses. The overall significance of the field can be well understood by the following quote from one of the pioneers of the field, UC Berkeley professor Jay Keasling: "(Synthetic biology is) doing for biology what electrical engineering did for physics and what chemical engineering has done for chemistry."

One example of synthetic biology comes from Jay Keasling's lab. In collaboration with the Gates Foundation and OneWorld Health, the first nonprofit pharmaceutical in the US, Dr. Keasling's lab is engineering a new metabolic pathway in E.coli to produce the precursor to artemisinin, currently the most effective treatment for malaria. The prospects include a drastic drop in cost, from dollars to dimes. Moreover, success in redesigning a metabolic pathway in bacteria holds great promise for reproducibility for other similar pathways important for the pharmaceutical, cosmetics and food industries.

Genomics vs. proteomics

Molecular biologists, today, are inundated with neologies ending with the suffix "-ome" and "-omics." The consequence is that the expression "–omics" craze has found its place in the everyday language of these scientists. Basically, the suffix "-om-" refers to a totality of some sort. All the genes considered as a whole in an organism's cell are called the "genome, and similarly all the proteins this genome can synthesize are referred to as the "proteome." "Genomics" and "proteomics" refer to the study of the relevant "-ome," as opposed to studying genes and proteins one by one.

Even though there exist so many –omics words in the literature these days, genomics and proteomics remain the most popular and useful ones. Proteomics can be thought of as the natural successor to genomics because it is fundamentally the next level of complexity after genomics. While scientists explore gene networks and their interactions in genomics, proteomics involves the study of all the proteins and their interactions in the cellular machinery of an organism. Unfortunately, the next level of complexity does not mean "linearly more complex" in this case; studying networks of three-dimensional molecules is an immensely more daunting task than studying those of one-dimensional DNA sequences. However, luckily for us, scientists are up to this challenge. Yet again, we observe a shift in focus in the scientific community from genomics to proteomics.

The main motivation for this shift can be roughly understood with an analogy from marketing or another one from military warfare. In the former, if you want a better marketing strategy for your product, you should target end-users first and foremost. Understanding behavioral patterns and preferences of end-users is much more important than understanding likes of your vendors, because eventually it is the end-user who will determine the demand for your product. In the latter analogy, we think of an army of soldiers who receive orders from a general commander; however, these orders can later be modified or completely annulled by orders from other commanders still in the hierarchical order. If you think about how reliable and informative knowing the orders that each soldier has received from the general commander is going to be, you will understand how useful it will be to have information on genes without supplementary information on proteins. Gene products, either RNAs or proteins, may undergo some steps called "post-translational modification" that are not completely understood, and worse yet may not be completely deterministic (implying random factors).

So, with the help of the analogies mentioned above, we can reason that the shift in focus of the scientific community from genomics to proteomics is mainly due to the fact that biological functions are carried out, not by DNA or genes, but by proteins and (although much less frequently than by proteins) by RNA molecules. For medical and other practical purposes, it is more important to acquire information on the proteome rather than the genome. This, of course, is not to suggest underestimating the importance of the genome. The genome preserves its significance as the origin and source of genetic information. It is just not as beneficial to think about the genome without looking at the final product, that is the proteome.


The completion of the rough draft of the Human Genome Project in 2000 marked the end of the Genetic Era and paved the way to the Genomic Era. The breakthroughs that have taken place since this cornerstone event have been breathtaking, awe-inspiring and maybe even hard to catch up with. The Genomic Era had given birth to different fields in a span of few years, and the biological scientific community has had to shift its focus from genomics to proteomics even without having sorted out the puzzles of the genome. The advent of the "-omics craze" was probably a by-product of this shift because suddenly each sub-field of molecular biology had to adapt a holistic approach in its explorations. Investigating a single entity, whether it be a gene or a protein or another molecule, quickly became stigmatized as "obsolete."

This transition to a holistic approach has resulted in the interaction of biologists with scientists from quantitative fields such as mathematics, statistics and computer science. These interactions gave rise to truly interdisciplinary research fields such as systems biology, synthetic biology and computational biology. More and more scientists today believe that competence in the future will rely on incorporating expertise from these different fields. With each new discovery, realizing the level of complexity and the intricacy in the design of our body leaves us in true awe. Moreover, these discoveries only make it easier for us to grasp how little we know about the miraculous design of biological systems. On the other hand, this awareness makes us even more motivated to delve into scientific efforts because understanding the science behind creation takes us directly to the understanding of our Creator.

Jason Newfoundland is a PhD candidate in Bioinformatics at University of Michigan.


  2. This amazing process is demonstrated in this link:
  3. Synthetic Biology: Change on the Horizon, Karsten Temme,
  4. A glossary for –omics words exists at