This study investigates the evolutionary relationships among twelve animal species through an in-depth analysis of their hemoglobin protein structures. Using secondary structure data and FASTA sequences obtained from the Protein Data Bank (PDB), this research applies Ramachandran plot analysis and MATLAB-based phylogenetic tree construction to explore protein structural and sequence similarities. The Ramachandran plots were employed to categorize dihedral angles into specific conformational regions, providing a comparative framework across the species. Phylogenetic trees were then generated from both angular distributions and sequence alignments to assess whether structural similarities correspond with evolutionary relationships. Our findings reveal certain inconsistencies between sequence-based and structure-based evolutionary groupings, suggesting that structural adaptations in hemoglobin may not strictly follow sequence evolution pathways. This work offers insights into the complex relationship between protein structure and evolutionary adaptation, with implications for understanding protein function diversity across species.
Contents
Abstract
List of Tables
List of Figures
Contents
1 Introduction
1.1 Background of Protein Structure Analysis
1.2 Research Hypothesis
1.3 Significance of Study
1.4 Objectives
2 Literature Review
2.1 Protein Structures and Evolution
2.2 Significance of Hemoglobin Protein
2.3 FASTA Sequences and Secondary Structures
2.4 Ramachandran Plot and Phylogenetic tree
3 Proposed Work and Methodology
3.1 Data collection
3.2 Tools and Software Used
3.3 Ramachandran Plot Generation
3.4 Distribution of 0 and 0 Angles into Defined Regions
3.5 Comparison of Angle Distribution Across Proteins
3.6 Phylogenetic Tree Construction Based on Angle Distribution
3.7 Comparison of FASTA Sequences
3.8 Comparison of Structural and Sequence-Based Trees
4 Data Analysis/ Algorithms
4.1 Overview of Data Analysis Approach
4.2 Ramachandran Plot Algorithm
4.3 Secondary Structure Analysis Algorithm
4.4 Sequence Alignment Algorithm
4.5 Phylogenetic Tree Construction Algorithm
4.6 Validation
5 Results and Discussion
5.1 Ramachandran Plot Result
5.2 Secondary Structure Analysis Result
5.3 Sequence Alignment Result
5.4 Phylogenetic Tree Result
5.5 Discussion
6 Conclusion and Future Work
6.1 Conclusion
6.2 Limitations
6.3 Future Work
Bibliography
Abstract
This study investigates the evolutionary relationships among twelve animal species through an in-depth analysis of their hemoglobin protein structures. Using secondary structure data and FASTA sequences obtained from the Protein Data Bank (PDB), this research applies Ramachandran plot analysis and MATLAB-based phylogenetic tree construction to explore protein structural and sequence similarities. The Ramachandran plots were employed to categorize dihedral angles into specific conformational regions, providing a comparative framework across the species. Phylogenetic trees were then generated from both angular distributions and sequence alignments to assess whether structural similarities correspond with evolutionary relationships. Our findings reveal certain inconsistencies between sequence-based and structure-based evolutionary groupings, suggesting that structural adaptations in hemoglobin may not strictly follow sequence evolution pathways. This work offers insights into the complex relationship between protein structure and evolutionary adaptation, with implications for understanding protein function diversity across species
List of Tables
5.1 Ramachandran Plot Analysis - No. of Dihedral Angles
5.2 Ramachandran Plot Analysis
5.3 Ramachandran Plot Analysis - Difference in Ramachandran Plot of animals
5.4 Sequences of Animals
5.5 Sequence Analysis - Difference in Sequences of Animals
List of Figures
1.1 The Nobel Prize winner in Chemistry 2024. The Nobel Prize was awarded with one half to David Baker “for computational protein design” and the other half jointly to Demis Hassabis and John M. Jumper “for protein structure prediction”.[https://www.nobelprize.org/prizes/chemistry/2024/summary/]
1.2 Hemoglobin protein structure of Homo sapiens (Human) is downloaded from [1]
2.1 Different regions of Ramachandran plot, Beta-sheet, Right handed alpha helix, Left handed alpha helix and allowed regions are shown by yellow line
3.1 Hemoglobin protein structure of Mouse downloaded from [1]
3.2 Hemoglobin protein structure of Rat downloaded from [1]
3.3 Hemoglobin protein structure of Tuna downloaded form [1]
3.4 Hemoglobin protein structure of Goose downloaded form [1]
3.5 Hemoglobin protein structure of Ostrich downloaded form [1]
3.6 Hemoglobin protein structure of Sheep downloaded form [1]
3.7 Hemoglobin protein structure of Horse downloaded form [1]
3.8 Hemoglobin protein structure of Dog downloaded form [1]
3.9 Hemoglobin protein structure of Cow downloaded form [1]
3.10 Hemoglobin protein structure of Pig downloaded form [1]
3.11 Hemoglobin protein structure of Cat downloaded form [1]
3.12 Ramachandran plot of Human protein
5.1 Ramachandran plot of cat protein
5.2 Ramachandran plot of cow protein
5.3 Ramachandran plot of dog protein
5.4 Ramachandran plot of tuna protein
5.5 Ramachandran plot of goose protein
5.6 Ramachandran plot of horse protein
5.7 Ramachandran plot of mouse protein
5.8 Ramachandran plot of ostrich protein
5.9 Ramachandran plot of pig protein
5.10 Ramachandran plot of rat protein
5.11 Ramachandran plot of sheep protein
5.12 Phylogenetic Tree based on Ramachandran Plots
5.13 Phylogenetic Tree based on FASTA sequences
Chapter 1. Introduction
1.1 Background of Protein Structure Analysis
Proteins are remarkably diverse and essential molecules that catalyze the vast majority of chemical reactions crucial for sustaining life. Serving as enzymes and hormones, as well as providing structural support for cells, proteins are integral to nearly all biological processes. Their diverse functionalities are intrinsically linked to their unique three-dimensional structures, which are dictated by the specific sequences of amino acids that compose them.
Recent breakthroughs in protein science have unlocked exciting new possibilities. The 2024 Nobel Prize in Chemistry highlighted two transformative advancements: David Baker’s work on the design of entirely novel proteins and Demis Hass- abis and John Jumper’s AI model, AlphaFold2, which predicts protein structures based on their amino acid sequences [2]. These achievements not only showcase the remarkable potential of proteins but also provide tools to better comprehend and harness their functions. Baker’s work has facilitated the creation of innovative proteins with applications spanning medicine to nanotechnology, while AlphaFold2 has resolved a long-standing challenge, enabling researchers to predict the structures of nearly all known proteins. These discoveries underscore the profound significance of proteins as "chemical tools" of life.
Illustrations are not included in the reading sample
Figure 1.1: The Nobel Prize winner in Chemistry 2024. The Nobel Prize was awarded with one half to David Baker “for computational protein design” and the other half jointly to Demis Hassabis and John M. Jumper “for protein structure predic- tion”.[https://www.nobelprize.org/prizes/chemistry/2024/summary/]
1.2 Research Hypothesis
Inspired by these advances, this study investigates whether comparing the primary (FASTA sequences) and secondary structures of proteins can reveal evolutionary connections between species. In this study, we have specifically focused on hemoglobin proteins across different animal species, as hemoglobin is a critical protein for transporting oxygen in the blood. Its structural features are highly conserved across species. The hypothesis is that structural similarities in the protein structures should correlate with evolutionary closeness, indicating shared ancestry of the species. By examining both the sequences; i.e., primary structures and secondary structures, this research aims to explore how well these features reflect the closeness of evolutionary relationships between species.
Illustrations are not included in the reading sample
Figure 1.2: Hemoglobin protein structure of Homo sapiens (Human) is downloaded from [1]
1.3 Significance of Study
Understanding the relationship between protein structure and evolutionary closeness can shed light on how species have evolved and adapted over time. In light of recent advancements in protein design and structure prediction, this study will contribute to a growing body of knowledge in various interdisciplinary fields.
The ability to see how proteins vary across species can reveal not only how specific functions emerged but also how these insights might be applied to practical fields for various discoveries across the disciplines.
1.4 Objectives
This study sets out to compare the FASTA sequences and secondary structures of hemoglobin proteins from twelve different animal species, with the goal of analyzing their evolutionary relationships. The specific aims are to:
1. Examine hemoglobin protein structures among different species using Ramachandran plots, to achieve a detailed view of dihedral angles in protein through secondary structures and to understand the evolutionary relationship using FASTA sequences.
2. Build a phylogenetic tree to visualize the evolutionary relationships among the species based on protein structures as well as FASTA sequences.
3. Assess how closely sequence-based comparisons align with structure-based relationships among these proteins.
4. Explore any differences between sequence and structure comparisons to understand what they reveal about evolutionary relationships.
The data for this investigation were obtained from the online Protein Data Bank [1], and MATLAB was employed to generate Ramachandran plots and phylogenetic trees. By integrating state-of-the-art computational methods with the latest advancements in protein science, this research endeavors to enhance our comprehension of protein evolution and investigate how groundbreaking innovations, including those honored by the 2024 Nobel Prize, can shape the future trajectory of scientific exploration.
Chapter 2. Literature Review
2.1 Protein Structures and Evolution
Proteins are fundamental to the functioning of all living organisms, performing a wide range of biological roles such as catalysis, structural support, transport and immune defense. At the molecular level, the three-dimensional structure of a protein is critical in determining its function. Protein structures are composed of a sequence of amino acids that fold into complex arrangements, giving rise to the primary, secondary, tertiary, and sometimes quaternary structures. Understanding how protein structures evolve is essential for interpreting evolutionary relationships between species [3]. Comparative analysis of protein structures allows to understand the common ancestry of proteins as it gets indicated by the structural elements that get conserved in them. The structural constraints on proteins often preserve function despite changes in the primary sequence.
2.2 Significance of Hemoglobin Protein
Hemoglobin, a hemoprotein is responsible for transporting oxygen from the lungs to tissues in vertebrates, is an ideal candidate for studying evolutionary relationships due to its critical role in survival and its well-characterized structure. Structural conservation across species can be a robust indicator of evolutionary relatedness, particularly when proteins with essential functions, like hemoglobin, are analyzed. Hemoglobin’s function relies on its quaternary structure, which consists of four polypeptide chains (two alpha and two beta chains in humans) and each chain contains a heme group that binds oxygen.
Hemoglobin’s evolutionary significance lies in its role in adapting to different environmental conditions, such as varying oxygen levels. Studies comparing the hemoglobin structures of different species have revealed how mutations lead to functional adaptations while preserving overall structural integrity. For example, species living at high altitudes or in deep-water environments often exhibit unique hemoglobin variants that reflect evolutionary pressures. By comparing the FASTA sequences and secondary structures of hemoglobin across different species, researchers can identify conserved regions that point to shared evolutionary history [4].
2.3 FASTA Sequences and Secondary Structures
The primary structure of a protein is represented by its amino acid sequence, commonly denoted in FASTA format. FASTA sequences are essential for bioinformatics analyses as they provide the raw data needed to predict protein structures and evolutionary relationships. Sequence alignment tools compare FASTA sequences from different species to identify conserved regions, substitutions, insertions and deletions, all of which provide clues about evolutionary divergence [3]. Beyond the primary structure, secondary structures of proteins, including alpha helices and beta sheets, arise from hydrogen bonding patterns between backbone atoms. These secondary structures are highly conserved and provide stability to the protein. Comparing secondary structures between species adds another layer of insight, as certain folding patterns tend to be preserved through evolutionary time due to functional constraints. Proteins that perform similar roles across species often exhibit similar secondary structures, even if their primary sequences have diverged. This makes secondary structure comparison a powerful tool in evolutionary biology [5].
2.4 Ramachandran Plot and Phylogenetic tree
The Ramachandran plot is a graphical representation of the dihedral angles, 0 and 0, between the peptide bonds in a protein. These angles determine the possible conformations a protein’s backbone can adopt. Because certain dihedral angles are energetically favorable, the Ramachandran plot is used to assess the stereochemical quality of protein structures, indicating which conformations are allowed or disallowed based on steric clashes and bonding constraints [6].
In evolutionary studies, the Ramachandran plot helps in comparing the overall conformational space occupied by proteins from different species. By visualizing the dihedral angles one can identify whether structural motifs are conserved across species. This information can be correlated with evolutionary proximity, as proteins from closely related species are likely to occupy similar regions in the Ramachandran plot, reflecting conserved structural features.
The phylogenetic tree is a diagram that represents the evolutionary relationships among species, based on similarities and differences in their genetic or protein data. For protein structure comparisons, a phylogenetic tree is typically constructed by aligning sequences or structures and using algorithms to calculate the evolutionary distance between species. In this study, the tree is constructed using the FASTA sequences and secondary structures of hemoglobin proteins of twelve species.
Illustrations are not included in the reading sample
Figure 2.1: Different regions of Ramachandran plot, Beta-sheet, Right handed alpha helix, Left handed alpha helix and allowed regions are shown by yellow line.
The resulting phylogenetic tree provides a visual representation of the evolutionary relationships, where closely related species cluster together, while more distantly related species appear further apart [7]. Phylogenetic trees derived from protein data, especially highly conserved proteins like hemoglobin, are often reliable indicators of species relatedness. Differences between the sequence-based and structure-based trees can point to evolutionary pressures that affect the protein structure differently from its sequence.
Chapter 3. Proposed Work and Methodology
This study employs a comprehensive methodology to analyze the structural and evolutionary relationships of hemoglobin proteins from twelve animal species, focusing on both Ramachandran plot analysis and sequence-based comparisons. MATLAB was used extensively to process the data, generate visualizations and construct phylogenetic trees. The analysis was conducted in several stages, as outlined below.
3.1 Data collection
The data was collected from the Protein Data Bank (PDB) [1], an open-access online repository that provides comprehensive structural information on proteins and nucleic acids.
Each species’ hemoglobin protein sequence was downloaded in FASTA and pdb format from Protein Data Bank. The FASTA sequences provided the basis for sequence alignment and structural comparison. These sequences were used to study conserved amino acid regions and their role in the overall stability and function of the hemoglobin molecule. Fasta sequences of all the twelve species is shown in table 5.4.
The primary dataset for this study consisted of hemoglobin proteins’ secondary structures from twelve different animal species namely:
1. Human (Homo sapiens) - The PDB file 1A3N represents the structure of deoxy human hemoglobin, an oxygen-transporting protein categorized under Homo sapiens. The structure comprises two distinct chain types: alpha chains and beta chains, originating from human red blood cells. Key concepts associated with this structure include oxygen transport, heme, respiratory protein, and erythrocyte. The structure was elucidated using X-ray diffraction at a resolution of 1.80 A. The study, published in Acta Crystallographica in 2000, focuses on the structure of deoxy human hemoglobin and a mutant hemoglobin at 120K [8]. The refinement of the structure was carried out using the REFMAC program, utilizing 49,661 reflections. Hemoglobin protein structure is shown in figure 1.2
2. Mouse (Mus musculus) - The PDB file 3HRW for mouse hemoglobin includes detailed structural information derived from X-ray diffraction at a resolution of 2.80 Â. The hemoglobin structure consists of two chains: the alpha subunit (chains A and C) and the beta-1 subunit (chains B and D), sourced from the red blood cells of Mus musculus (mouse). The study was published in Acta Crystallographica Section F, 2021 focused on comparing Crystal structure of hemoglobin from mouse (Mus musculus) with those from other small animals and humans [9]. The structure was refined using REFMAC 5.2.0019 software, with various technical parameters like B-values (related to thermal motion) and correlation coefficients (FO-FC) provided to describe the quality of the refinement.
Illustrations are not included in the reading sample
Figure 3.1: Hemoglobin protein structure of Mouse downloaded from [1]
3. Rat (Rattus norvegicus) - The PDB file for rat hemoglobin 3DHT contains structural data determined through crystallography. This hemoglobin structure from Rattus norvegicus includes the alpha subunit (chains A) and the beta-1 subunit (chain B), derived from red blood cells. The file highlights the function of hemoglobin in oxygen storage and transport. The structure was deposited on June 18, 2008, with details on the scientific name, common name (rat), and taxonomy provided by the authors K. Neelagandan, P. Sathya Moorthy, M. Balasubramanian, S. Sundaresan, and M. N. Ponnuswamy and the findings are yet to be published [10].
Illustrations are not included in the reading sample
Figure 3.2: Hemoglobin protein structure of Rat downloaded from [1]
4. Fish-bluefin tuna (Thunnus thynnus) - The PDB file 1V4X for bluefin tuna hemoglobin A provides structural insights into its pH sensitivity, explaining a phenomenon known as the Root effect, which influences oxygen affinity and release. Published in the Journal of Biological Chemistry in 2004, the study was conducted by T. Yokoyama, K.T. Chong, G. Miyazaki, H. Morimoto, D.T. Shih, S. Unzai, J.R. Tame, and S.Y. Park [11]. The structural data were determined through X-ray diffraction at a resolution of 1.60 Â and refined using the REFMAC software.
Illustrations are not included in the reading sample
Figure 3.3: Hemoglobin protein structure of Tuna downloaded form [1]
5. Goose (Anser indicus) - The PDB file 1A4F for bar-headed goose hemoglobin details the oxygen-transport structure in the oxy form, essential for highaltitude respiration in Anser indicus. It includes both alpha (chain A) and beta (chain B) subunits and was resolved through X-ray diffraction at 2.00 A. The study, conducted by J. Zhang, X. Gu, Z. Hua, J.R. Tame, and G. Lu, was published in Journal of Molecular Biology in 1996, focusing on the high oxygen affinity adaptation of this hemoglobin [12]. The structure was refined with the PROLSQ program by Konnert and Hendrickson.
Illustrations are not included in the reading sample
Figure 3.4: Hemoglobin protein structure of Goose downloaded form [1]
6. Ostrich (Struthio camelus) - The PDB file 3FS4 describes the crystal structure of ostrich hemoglobin, determined through X-ray diffraction at a resolution of 2.2 A. The structure comprises two hemoglobin subunits: the alpha subunit (chains A and C) and the beta subunit (chains B and D), both derived from the ostrich. Key features include its quaternary structure, metal-binding properties, and role in oxygen transport and storage. The research provides insights into the oxygen-binding properties of hemoglobin in flightless birds, specifically comparing the ostrich and turkey. The study was conducted by a team including authors S. S. Sundaresan, P. Ramesh, and others, and published in Acta Crystallographica in 2021 [9]
7. Sheep (Ovis Aries) - The PDB file 2QU0 details the crystal structure determination of sheep methemoglobin at a resolution of 2.7 Â. The structure includes two types of hemoglobin subunits: the alpha subunit (chains A and C) and the beta subunit (chains B and D), both sourced from erythrocytes. The refinement involved X-ray diffraction techniques, contributing to the understanding of methemoglobin’s quaternary structure, oxygen storage, and transport capabilities. The study was conducted by authors K. Neela- gandan, P. Sathya Moorthy, M. Balasubramanian, S. Sundaresan, and M. N. Ponnuswamy [13].
Illustrations are not included in the reading sample
Figure 3.5: Hemoglobin protein structure of Ostrich downloaded form [1]
8. Horse (Equus caballus) - The PDB file 2MHB presents the structure of horse methemoglobin (*Equus caballus*) determined through X-ray diffraction at a resolution of 2.0 Â. The structure consists of two engineered hemoglobin chains: the alpha chain (chain A) and the beta chain (chain B), both of which are derived from horse erythrocytes. This study emphasizes the role of hemoglobin in oxygen transport. The research was conducted by authors R. C. Ladner, E. G. Heidner, and M. F. Perutz, and it was published in the Journal of Molecular Biology in 1977 [14]. The study provides valuable insights into the structural characteristics of methemoglobin, contributing to the understanding of oxygen transport mechanisms in horses.
9. Dog (Canis lupus familiaris) - The PDB file 3GOU describes the crystal structure of dog (Canis familiaris) hemoglobin, determined through X-ray diffraction at a resolution of 3.00 Â. The structure comprises two types of hemoglobin subunits: alpha (chains A and C) and beta (chains B and D). This study focuses on the molecular characteristics related to oxygen transport and metal-binding capabilities of hemoglobin in dogs, providing insights into its tetrameric structure and polymorphism. The research was conducted by authors S. S. Sundaresan, P. Ramesh, M. Thenmozhi, and M. N. Ponnuswamy, with findings that are set to be published [15]
Illustrations are not included in the reading sample
Figure 3.6: Hemoglobin protein structure of Sheep downloaded form [1]
Illustrations are not included in the reading sample
Figure 3.7: Hemoglobin protein structure of Horse downloaded form [1]
Illustrations are not included in the reading sample
Figure 3.8: Hemoglobin protein structure of Dog downloaded form [1]
10. Cow (Bos Taurus) - The PDB file 6IHX details the crystal structure analysis of bovine hemoglobin, focusing on modifications introduced by single nucleotide polymorphisms (SNPs). The study includes two types of hemoglobin subunits: alpha (chains A and C) and beta (chains B and D), sourced from the organism Bos taurus (bovine). The research emphasizes key aspects such as heme and iron ion binding, which are crucial for the protein’s oxygen transport function. The structure was resolved using X-ray diffraction at a high resolution of 1.46 Â, providing insights into the quaternary structure of hemoglobin in relation to its interactions with albumin. The study was published by Y. Morita and T. Yamada and M. Kureishi and K. Kihira and T. Komatsu on Quaternary Structure Analysis of a Hemoglobin Core in Hemoglobin-Albumin Cluster in the journal J. Phys. Chem. B in year 2018 [16].
11. Pig (Sus scrofa) - The PDB file 1QPW presents the crystal structure determination of porcine hemoglobin, specifically from the Taiwanese pig (Sus scrofa), at a resolution of 1.8 Â. This study includes both alpha (chains A and C) and beta (chains B and D) subunits of the hemoglobin molecule. The structure was determined using X-ray diffraction, and refinement was performed with the X-PLOR 3.1 software. T.-H. Lu and K. Panneerselvam and Y.-C. Liaw and P. Kan and C.-J. Lee published this in journal Acta Crystallographica Section D, year 2000 [17]
Illustrations are not included in the reading sample
Figure 3.9: Hemoglobin protein structure of Cow downloaded form [1]
Illustrations are not included in the reading sample
Figure 3.10: Hemoglobin protein structure of Pig downloaded form [1]
12. Cat (Felis catus) - The PDB file 3GYS describes the crystal structure determination of hemoglobin from the domestic cat (Felis silvestris catus) at a resolution of 2.9 Â. The structure includes both alpha (chains A, C, E, and G) and beta (chains B, D, F, and H) subunits. The hemoglobin was analyzed using X-ray diffraction, and the results contribute to understanding the functional properties of cat hemoglobin, particularly in relation to its polymorphism and chromatographic behavior. The authors of this study are M. Balasubra- manian, P. Sathya Moorthy, K. Neelagandan, and M. N. Ponnuswamy, and it is set to be published in a journal yet to be specified [18].
Illustrations are not included in the reading sample
Figure 3.11: Hemoglobin protein structure of Cat downloaded form [1]
3.2 Tools and Software Used
To perform the sequence and structural analyses, the following tools and software were used:
MATLAB: MATLAB was used as the primary tool for running custom scripts/self made codes and performing data analysis. The MATLAB environment was chosen due to its powerful data analysis capabilities. Custom codes were written to:
1. Generate Ramachandran plots for each species’ hemoglobin structure to visualize the conformational space occupied by the protein.
2. Align FASTA sequences to identify similarities and differences in amino acid sequences between species.
3. Construct phylogenetic trees to depict evolutionary relationships based on sequence and structural similarities.
3.3 Ramachandran Plot Generation
The first step of the analysis involved generating Ramachandran plots for the set of proteins for all the twelve species.
Illustrations are not included in the reading sample
Figure 3.12: Ramachandran plot of Human protein
MATLAB was used to compute the 0 and 0 torsion angles for each protein from their structural data, typically obtained from Protein Data Bank (PDB) files.
Input Data: Protein structures were loaded into MATLAB from PDB files available at [1]. For each protein, the backbone torsion angles 0 and 0 were calculated.
Plot Generation: Using the calculated torsion angles, Ramachandran plots were created for each protein, with 0 angles plotted on the x-axis and 0 angles on the y- axis. These plots visually represent conformational space occupied by the protein’s backbone dihedral angles.
3.4 Distribution of 0 and ty Angles into Defined Regions
Following the generation of Ramachandran plots, the 0 and ty angles were categorized into 10 specific regions based on established Ramachandran plot conventions. Each region corresponds to a distinct conformation type within the protein structure; refer figure 2.1. The regions included the following :
1. The core beta
2. The core left alpha
3. The core right alpha
4. The allowed 1
5. The allowed 2
6. The allowed 3
7. The allowed 4
8. The allowed 5
9. The allowed 6
10. The disallowed region
Region Assignment: For each protein, MATLAB scripts were used to assign the 0 and ty angles to one of these 10 regions. This classification allows for the analysis of how different proteins utilize conformational space.
Visualization: A summary of the distribution of 0 and ty angles across the 10 regions was visualized in MATLAB for each protein, providing a comparative view of conformational preferences; data shown in table 5.1.
3.5 Comparison of Angle Distribution Across Proteins
Once the torsion angles were categorized, the distributions were compared across all proteins.
Quantitative Comparison: The frequency of angles in each region was quantified for each protein. The similarity in the use of Ramachandran space was then compared between proteins to identify patterns of conformational similarities or differences; data shown in table 5.2.
3.6 Phylogenetic Tree Construction Based on Angle Distribution
To quantify the similarities between proteins based on their 0 and 0 angle distributions, a phylogenetic tree was generated.
Distance Calculation: A distance matrix was computed using the distributions of 0 and 0 angles across the 10 regions. This matrix quantifies the dissimilarity between each pair of proteins based on how their torsion angles are distributed; data shown in table 5.3.
Tree Generation: MATLAB’s phylogenetic functions were employed to construct a phylogenetic tree from the distance matrix. This tree reflects the structural similarities between proteins based on their Ramachandran plot analysis; shown in figure 5.12.
3.7 Comparison of FASTA Sequences
In parallel, the protein sequences were compared using their FASTA sequences. FASTA Input: The amino acid sequences of the proteins were input in FASTA format. Sequence alignment was performed to assess the similarity between the proteins at the sequence level; data shown in table 5.5.
Phylogenetic Tree from Sequences: Based on the aligned sequences, another phylogenetic tree was generated using MATLAB’s sequence analysis tools. This tree represents the evolutionary relationships and sequence similarities between the proteins; shown in figure 5.13.
3.8 Comparison of Structural and Sequence-Based Trees
Both the angle-based phylogenetic tree and the sequence-based phylogenetic tree were compared to evaluate the correlation between structural and evolutionary relationships.
Analysis: By comparing the two trees, insights were gained into whether proteins with similar structural features (based on their 0 and 0 angle distributions) also shared evolutionary relationships (based on their sequences). Differences between the trees highlight cases where structural similarity does not necessarily align with sequence similarity.
Chapter 4. Data Analysis/ Algorithms
4.1 Overview of Data Analysis Approach
In this study, we mainly focused on comparing the secondary structures of hemoglobin proteins from twelve different animal species and also the FASTA sequences of them to investigate and analyse evolutionary relationship between these species. By leveraging bioinformatics algorithms and structural visualization techniques, we aimed to identify both conserved and variable regions within hemoglobin across species, which could offer insights into the protein’s evolutionary and functional conservation. In this study, the analysis focused on the comparison of 0 and 0 torsion angles across twelve species, utilizing the Ramachandran plot to categorize angles into ten predefined regions. For each protein, torsion angles were assigned to specific regions, namely: the core beta, core left alpha, core right alpha, six allowed regions, and the disallowed region. This categorization facilitated a detailed view of conformational space utilization within each protein. The analysis involved multiple sequence alignment, Ramachandran plot generation, phylogenetic tree construction, and secondary structure analysis, each facilitated by MATLAB and its bioinformatics tools. Quantitative comparisons were then performed to analyze the frequency distribution of 0 and 0 angles in each region. This allowed the identification of shared conformational tendencies or unique structural patterns across species.
4.2 Ramachandran Plot Algorithm
The Ramachandran plot analysis was performed to understand the conformational preferences of hemoglobin proteins across species. Each residue’s dihedral angles, 0 and 0, were calculated and plotted, allowing us to examine the allowed and disallowed regions for the protein structures.
Method: Custom MATLAB scripts calculated the 0 and 0 angles for each residue from PDB structural data. These values were then plotted on a Ramachandran plot for each species, categorizing the amino acids into allowed, partially allowed, and disallowed regions.
Interpretation: Ramachandran plots offered insights into structural stability, with residues in allowed regions suggesting stable conformations. By comparing plots across species, we could observe structural similarities and differences in hemoglobin, providing a basis for understanding conformational constraints and evolutionary adaptations.
4.2.1 MATLAB CODE
Illustrations are not included in the reading sample
4.3 Secondary Structure Analysis Algorithm
Secondary structure analysis of hemoglobin proteins was conducted to examine patterns of alpha helices, beta sheets and loops across species. Such elements are critical for the stability and function of hemoglobin and are often conserved due to their structural significance.
Method: Secondary structure elements were extracted from the PDB files and processed using MATLAB to identify patterns of structural conservation across species. Alpha helices and beta sheets, the main secondary structures in hemoglobin, were compared across species to assess structural conservation.
Significance: By analyzing the secondary structures, we were able to understand how the folding patterns of hemoglobin were conserved or varied. Such conservation or divergence in folding patterns offers insights into how specific structural features are crucial for function and how evolutionary forces may act to preserve these features.
4.4 Sequence Alignment Algorithm
To analyze the sequence similarity across hemoglobin proteins, Multiple Sequence Alignment (MSA) was employed using MATLAB’s bioinformatics toolbox. The MSA process is crucial for identifying conserved amino acid regions, which can indicate structural stability and functional importance in evolution.
Method: Sequence alignments using FASTA data were compared with angle distributions, helping to correlate sequence-based and structural relationships through phylogenetic trees generated in MATLAB.
Significance: By comparing sequence alignments across species, we could identify conserved motifs within hemoglobin, which are indicative of functionally or structurally essential regions. These conserved regions, in turn, provide insight into evolutionary pressures that may have shaped hemoglobin’s structure.
4.5 Phylogenetic Tree Construction Algorithm
A phylogenetic tree was constructed to map the evolutionary relationships of the twelve species based on their hemoglobin sequences and structural data. This tree offered a visualization of evolutionary proximity, highlighting both sequence and structural conservation.
Method: Using MATLAB’s phylogenetic functions, a tree based on distance matrices derived from sequence similarity scores and structural parameters was produced. This method allowed for clustering species based on both sequence alignment and structural similarity.
Interpretation: The phylogenetic tree provided a hierarchical representation of evolutionary relationships. Closer branches indicated species with high sequence and structural similarity, supporting hypotheses of evolutionary relatedness. Discrepancies in clustering between sequence-based and structure-based methods were noted, as they might suggest unique evolutionary adaptations in structure or function; the underlying causes of these differences will be a focus of future research study.
4.6 Validation
To ensure the robustness of our analysis, we validated our sequence alignment results with structural analyses obtained from the Ramachandran plot and secondary structure comparisons.
Cross-validation Techniques: Both sequence-based and structure-based results were cross-referenced. For instance, regions identified as conserved in the MSA were checked against Ramachandran plot results to confirm structural stability. By comparing both the angle-based and sequence-based phylogenetic trees, the correlation between structural and evolutionary relationships was analysed. Discrepancies between sequence and structure analyses were carefully examined. When sequences displayed high similarity but structural differences appeared, this was noted as potential functional adaptation. The primary reasons for the differences in the results of these comparisons will be further explored in upcoming studies.
Chapter 5. RESULTS AND DISCUSSION
5.1 Ramachandran Plot Result
The Ramachandran plots displayed distinctive patterns in 0 and 0 angle distributions for each species, with core regions (beta and alpha) showing relatively consistent angle frequencies across species.
Interpretation: The majority of dihedral angles for all species fell within the allowed regions, with only a small number in the disallowed regions; the results are displayed in the tables 5.1 and 5.2. The high occupancy of residues within allowed regions suggests that hemoglobin’s structural constraints are conserved due to functional requirements. Variations in dihedral angles observed in some species could reflect adaptations to unique environmental conditions or physiological demands, such as oxygen storage or delivery under varying pressures. Based on the differences observed in the Ramachandran plots among the animal species, a phylogenetic tree was constructed to illustrate their evolutionary relationships. The analysis indicated that cows share the closest relationship with humans, followed by fish, then horses, and finally dogs, with the latter showing the least relatedness to humans.
The Ramachandran plots for all the animal species are presented below: Ramachandran plot of human is shown in figure 3.12 in section 3.3.
5.2 Secondary Structure Analysis Result
The secondary structure analysis revealed consistent patterns of alpha helices and beta sheets across species, supporting the hypothesis that these structures are evolutionarily conserved; the results are presented in the tables 5.1 and 5.2. Most species showed similar folding patterns, particularly in regions critical for hemoglobin’s functionality.
Interpretation: The conservation of these secondary structures highlights the importance of hemoglobin’s 3D configuration in binding and transporting oxygen. The evolutionary retention of these structural elements across species emphasizes the functional constraints placed on hemoglobin, as alterations could compromise oxygen-binding efficiency.
Illustrations are not included in the reading sample
Figure 5.1: Ramachandran plot of cat protein
Illustrations are not included in the reading sample
Figure 5.2: Ramachandran plot of cow protein
Illustrations are not included in the reading sample
Figure 5.3: Ramachandran plot of dog protein
Illustrations are not included in the reading sample
Figure 5.4: Ramachandran plot of tuna protein
Illustrations are not included in the reading sample
Figure 5.5: Ramachandran plot of goose protein
Illustrations are not included in the reading sample
Figure 5.6: Ramachandran plot of horse protein
Illustrations are not included in the reading sample
Figure 5.7: Ramachandran plot of mouse protein
Illustrations are not included in the reading sample
Figure 5.8: Ramachandran plot of ostrich protein
Illustrations are not included in the reading sample
Figure 5.9: Ramachandran plot of pig protein
Illustrations are not included in the reading sample
Figure 5.10: Ramachandran plot of rat protein
Illustrations are not included in the reading sample
Figure 5.11: Ramachandran plot of sheep protein
Table 5.1: Ramachandran Plot Analysis - No. of Dihedral Angles
Illustrations are not included in the reading sample
Table 5.2: Ramachandran Plot Analysis
Illustrations are not included in the reading sample
Table 5.3: Ramachandran Plot Analysis - Difference in Ramachandran Plot of animals
Illustrations are not included in the reading sample
5.3 Sequence Alignment Result
The multiple sequence alignment (MSA) of hemoglobin sequences across the twelve species revealed several highly conserved regions.
Interpretation: The conservation of specific regions suggests evolutionary pressures to maintain the function, as hemoglobin’s role in oxygen transport is essential for survival. Based on the differences between the sequences of all species, a phylogenetic tree was generated. The results indicated that cows are most closely related to sheep, followed by horses, then humans, and are least related to fish among the species analyzed.
Table 5.4: Sequences of Animals
Illustrations are not included in the reading sample
Table 5.5: Sequence Analysis - Difference in Sequences of Animals
Illustrations are not included in the reading sample
5.4 Phylogenetic Tree Result
Tree Structure and Evolutionary Clustering: The angle-based phylogenetic tree, constructed from the Ramachandran plot data, showed clustering that correlated closely with structural conformations in the proteins. This tree was further compared to the sequence-based tree, where it was found that structural similarities did not always align with sequence similarities, suggesting possible functional adaptations or evolutionary divergence.
Interpretation: Specific species pairs that displayed similar angle distributions in the core and allowed regions appeared closely linked on the structural tree, while those with distinct distributions occupied separate branches. Such findings reinforce the hypothesis that conformational patterns can reflect structural adaptations not always evident in sequence data alone. The primary focus of this study was to analyze the evolutionary relationships among species by examining the similarities in their Ramachandran plots, based on the secondary structures of proteins. The sequence-based phylogenetic tree indicated that cows are most closely related to sheep, followed by horses, then humans, with the least relatedness to ostriches and fish. In contrast, the tree generated from the Ramachandran plot showed cows as most closely related to humans, followed by fish, then horses and ostriches, with the least relatedness to dogs.
5.5 Discussion
Discussion of Discrepancies: By integrating the results from sequence alignment, Ramachandran plots, phylogenetic analysis and secondary structure examination, a comprehensive understanding of hemoglobin’s evolutionary trajectory emerges. Notably, some species with high sequence similarity showed minor structural deviations, as evidenced in their Ramachandran plots or secondary structure configurations. These discrepancies may reflect functional adaptations to specific ecological niches, suggesting that while sequence conservation is significant, structural flexibility may arise to meet unique physiological demands. A more in-depth investigation into the reasons behind the differences in the results of both trees will be the focus of future studies.
Illustrations are not included in the reading sample
Figure 5.12: Phylogenetic Tree based on Ramachandran Plots
Illustrations are not included in the reading sample
Figure 5.13: Phylogenetic Tree based on FASTA sequences
Chapter 6. Conclusion and Future Work
6.1 Conclusion
This study aimed to explore evolutionary relationships among twelve animal species by analyzing the hemoglobin protein through secondary structures and FASTA sequences. By using MATLAB custom codes for sequence alignments, Ramachandran plots and phylogenetic trees, this research provided insights into the structural conservation and functional evolution of hemoglobin, an essential protein responsible for oxygen transport. The sequence alignment results indicated significant conservation across the hemoglobin protein sequences, particularly in regions critical to its oxygen-binding function. Ramachandran plots showed that most residues in hemoglobin structures fall within the allowed conformational regions, highlighting structural stability across species. The phylogenetic analysis clustered species according to evolutionary lineages, while secondary structure comparisons showed consistent patterns in alpha helices and beta sheets, underscoring the evolutionary importance of hemoglobin’s 3D structure. Together, these findings reinforce the hypothesis that hemoglobin structure and function are evolutionarily conserved among species, with minor variations possibly reflecting adaptations to different environmental conditions. This study highlights the utility of combining sequence and structural analyses to understand evolutionary relationships. By observing both conserved sequences and structural adaptations, this research offers valuable perspectives on how essential proteins like hemoglobin have evolved to meet the physiological demands of different species. These insights contribute to the broader field of evolutionary biology and structural biochemistry, offering a foundation for further studies on protein structure-function relationships in various species.
6.2 Limitations
A few limitations impacted the scope and outcomes of this study. Firstly, the analysis was limited to twelve species, focusing solely on hemoglobin. Including a broader range of proteins and species could yield a more comprehensive understanding of evolutionary patterns. Moreover, the study relied on available PDB data, which may not fully represent structural diversity in natural environments. These limitations may have influenced the study’s findings, as a larger dataset or the use of additional protein families could present alternative evolutionary relationships or structural variations. Also the computational constraints, particularly with large datasets, may have limited the resolution of some analyses. Despite these constraints, the results offer a strong foundation for understanding the conserved and adaptive nature of hemoglobin structures across the species.
6.3 Future Work
Future research can expand on this study by analyzing hemoglobin in a larger range of species or examining other essential proteins or protein families to provide a comparative perspective. This expansion could help validate and deepen the understanding of evolutionary relationships and structural conservation in various protein families. To further refine these analyses, future work could employ advanced computational methods or machine learning tools, which have shown promise in recent studies. Beyond expanding species and protein families, future studies could investigate the functional implications of observed structural variations hence providing the in-depth understanding of reason behind the difference between the sequence-based and structure-based phylogenetic trees. This line of research can offer help in designing therapeutic proteins with enhanced stability or function.
Bibliography
[1] RCSB Protein Data Bank, “Rcsb protein data bank,” 2023. Accessed: 2024-0905.
[2] Nobel Prize Outreach AB, “Popular information,” 2024.
[3] L. Kocincova, M. Jaresova, J. Byska, B. Kozlikova, and M. Krone, “Comparative visualization of protein secondary structures,” BMC Bioinformatics, vol. 18 (Suppl 2), p. 23, 2017.
[4] H. F. Bunn, “Evolution of mammalian hemoglobin function,” Blood, vol. 58, no. 2, pp. 189-197, 1981.
[5] C. Chothia and A. M. Lesk, “The relation between the divergence of sequence and structure in proteins,” The EMBO Journal, vol. 5, no. 4, pp. 823-826, 1986. Communicated by M. F. Perutz.
[6] F. Carrascoza, S. Zaric, and R. Silaghi-Dumitrescu, “Computational study of protein secondary structure elements: Ramachandran plots revisited,” Journal of Molecular Graphics and Modelling, vol. 50, pp. 125-133, 2014.
[7] D. Penny, L. Foulds, and M. Hendy, “Testing the theory of evolution by comparing phylogenetic trees constructed from five different protein sequences,” Nature, vol. 297, pp. 197-200, 1982.
[8] J. Tame, “The structures of deoxy human haemoglobin and an artificial mutant (tyralpha42->his) have been solved at 120 k,” Acta Crystallographica, 2000. Cited by 122.
[9] S. Sundaresan, P. Ramesh, N. Shobana, T. Vinuchakkaravarthy, S. Yasien, and M. Ponnuswamy, “Crystal structure of hemoglobin from mouse (mus musculus) compared with those from other small animals and humans,” Acta Crystallographica Section F, 2021.
[10] K. Neelagandan, P. S. Moorthy, M. Balasubramanian, S. Sundaresan, and M. N. Ponnuswamy, “The crystal structure determination of rat (rattus norvegicus) hemoglobin,” To be published, 2008. PDB ID: 3DHT.
[11] T. Yokoyama, K. Chong, G. Miyazaki, H. Morimoto, D. Shih, S. Unzai, J. Tame, and S. Park, “Novel mechanisms of ph sensitivity in tuna hemoglobin: A structural explanation of the root effect,” Journal of Biological Chemistry, 2004.
[12] J. Zhang, X. Gu, Z. Hua, J. Tame, and G. Lu, “The crystal structure of a high oxygen affinity species of hemoglobin (bar-headed goose hemoglobin in the oxy form),” Journal of Molecular Biology, 1996.
[13] K. Neelagandan, P. S. Moorthy, M. Balasubramanian, S. Sundaresan, and M. N. Ponnuswamy, “Crystal structure determination of sheep (ovis aries) methemoglobin at 2.7 A resolution,” To be published, 2023. PDB ID: 2QU0.
[14] R. C. Ladner, E. G. Heidner, and M. F. Perutz, “The structure of horse methaemoglobin at 2.0 A resolution,” Journal of Molecular Biology, vol. 114, p. 385, 1977.
[15] S. S. Sundaresan, P. Ramesh, M. Thenmozhi, and M. N. Ponnuswamy, “Crystal structure of dog (canis familiaris) hemoglobin,” To be published, 2023. PDB ID: 3GOU.
[16] Y. Morita, T. Yamada, M. Kureishi, K. Kihira, and T. Komatsu, “Quaternary structure analysis of a hemoglobin core in hemoglobin-albumin cluster,” J. Phys. Chem. B, vol. 122, p. 12031, 2018.
[17] T.-H. Lu, K. Panneerselvam, Y.-C. Liaw, P. Kan, and C.-J. Lee, “Structure determination of porcine haemoglobin,” Acta Crystallographica Section D, vol. 56, p. 304, 2000.
[18] M. Balasubramanian, P. S. Moorthy, K. Neelagandan, and M. N. Ponnuswamy, “Crystal structure determination of cat (felis silvestris catus) hemoglobin at 2.9 angstrom resolution,” To Be Published, 2009. PDB ID: 3GYS.
[...]
- Quote paper
- Akhilesh Shende (Author), Srishti Kewlani (Author), 2025, Decoding Structural Evolution. MATLAB Based Analysis of Ramachandran Plots in Hemoproteins, Munich, GRIN Verlag, https://www.grin.com/document/1555828