Gene Ontology (GO)

Introduction

Gene Ontology mission is to help better understand genes, non-coding RNAs and proteins functions in many different organisms.

The Gene Ontology consortium created a biomedical ontology divided into 3 different classes: cellular component (CC), molecular function (MF) and biological process (BP), in which the roles of genes or gene products are cataloged.

This resource has now become a “must-have” in Biology and Medicine research. It is used by tens of thousands of publications, and is constantly evolving. As of March 2023, the GO knowledge base counted more than 5000 different organisms, 43.000 terms including 27.790 BP, 11.263 MF and 4.000 CC and almost 7.500.000 annotations (1).

History & Description

Initially created in 1998 by researchers working on different models (Drosophila, Saccharomyces cerevisiae and mouse model), the Gene Ontology consortium is composed of many scientists in biology and computer science from all over the world aiming to structure knowledge in biology by defining a common lexicon to describe entities and to annotate the relationships between these entities and genes and their gene products (2).

Three classes of GO terms

The GO knowledge base is structured in the form of an ontology with terms classified in 3 classes (3):

  • Cellular component (CC) corresponds to a description of the cellular anatomy, it refers to the place where the gene or its product are localized, for example “mitochondrial matrix – GO:0005759 “.
  • Molecular function (MF) corresponds to the activity at the molecular level of the gene or its gene product without notion of location and temporality, for example “lipid transporter activity – GO:0005319” or “transcription coactivator binding – GO:0001223”.
  • Biological process (BP) corresponds to the action in which the gene or its product is involved, for example “fatty acid beta-oxidation – GO:0006635” or “positive regulation of DNA-templated transcription – GO:0045893”

The structure of GO graph

Go terms are represented by nodes and the links between them by edges. These edges represent a hierarchy: a parent term is broader while a child term is more precise. The 3 classes of GO terms can be linked to each other.

Using this structure, the GO knowledge base can be updated with new knowledge, the GO Consortium even calls it a “dynamic” ontology (4).

Figure 1. Visual representation of the GO ontology hierarchy with the example of the biological process “Receptor clustering – GO:0043113” from the QuickGO interface.


As an example (Figure 1), the biological process “Receptor clustering – GO:0043113”, with two parents terms of different classes: Plasma membrane (Cellular component, green box) and “protein localization to membrane (biological process, blue box). Moreover, it has as more specific children like “skeletal muscle acetylcholine-gated channel clustering – GO:0071340” or “neurotransmitter-gated ion channel clustering – GO:0072578”. There are also different types of edges such as “positively regulates” in green and its opposite in red “negatively regulates”, or between 2 classes of GO terms, where “receptor clustering” occurs in “plasma membrane”.

GO Annotations

The annotation defines the function, location or role of a gene or gene product (1).

It summarizes the relationship between a gene (or its product), a GO term (CC, MF or BP). This relationship is supported by a reference from the scientific literature and a GO evidence code that defines the type of evidence, its proximity to the experimental result and whether a biocurator has generated this relationship. Annotations can be generated manually and automatically (IEA proof code) (1, 4).

The EMBL’s European Bioinformatics Institute (EMBL-EBI) has developed a web interface, called QuickGO, for querying annotations as well as the GO ontology (Figure 1 et Figure 2) (6).

Figure 2. Example of research on the QuickGO interface of biological processes in which the human AXL protein is involved.

On this example from QuickGO (Figure 2), we searched all the biological processes in which the human AXL protein is involved, the result is 62 biological processes. On the first result, the AXL protein is involved in the entry into host cells of viruses, this annotation has the evidence code GO ECO:0000315 – IMP, which stands for “mutant phenotype evidence used in manual assertion” which was extracted from the scientific article PMID:21047970 (The Tyro3 receptor kinase Axl enhances macropinocytosis of Zaire ebolavirus).

Reference

  1. http://geneontology.org/
  2. Ashburner, M., Ball, C., Blake J., Botstein, D., Butlerh, H., Cherry, M., Davis,A. et al. Gene Ontology: tool for the unification of biology. Nature Genetics. 25.1. 25-9 (2000). doi: 10.1038/75556.
  3. https://fr.wikipedia.org/wiki/Gene_Ontology
  4. The Gene Ontology Consortium. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res. Jan 8;47(D1):D330-D338 (2019). doi: 10.1093/nar/gky1055.
  5. Binns, D., Dimmer, E., Huntley, R., Barrell, D., O’Donovan, C., Apweiler, R. QuickGO: a web-based tool for Gene Ontology searching. Bioinformatics. Nov 15;25(22):3045-6. (2009). doi: 10.1093/bioinformatics/btp536.
  6. https://www.ebi.ac.uk/QuickGO/