Multiple Sequence Alignment of Small Heat Shock Protein 1 Orthologs

Name: Multiple Sequence Alignment of Small Heat Shock Protein 1 Orthologs
License: https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Keywords: Small Heat Shock Protein 1

Datos de investigación

Multiple Sequence Alignment of Small Heat Shock Protein 1 Orthologs

Autores: Bravo, Facundo Nicolás Eric

Colaboradores: Racigh, Vanesa Elizabeth Icon

; Rodriguez Sawicki, Luciana Icon

; Fornasari, Maria Silvina Icon

Publicador: Consejo Nacional de Investigaciones Científicas y Técnicas

Fecha de depósito: 28/11/2024

Fecha de creación: 05/06/2024-10/06/2024

Clasificación temática:

Bioquímica y Biología Molecular

Resumen

The curated dataset comprises 474 protein sequences, of which 199 are from invertebrates and 275 from vertebrates. During the curation process and after applying coverage and identity filters, no sequences from other kingdoms, including Plantae, Fungi, Eubacteria and Protista, remained in the dataset. Among the vertebrates, the dataset includes 88 mammals, 124 fish, 26 birds, 26 reptiles, and 11 amphibians. The MSA analysis shows that the ACD is highly conserved in length, with an average of 77.0 ± 0.5 amino acids, while the NTR (88.8 ± 18.3) is longer and more variable compared to the CTR (34.7 ± 8.4). The CTR lengths are similar between vertebrates (32.1 ± 8.1) and invertebrates (38.2 ± 7.5). In contrast, the NTR shows greater length variability between vertebrates (99.5 ± 16.6) and invertebrates (74.0 ± 6.5). This difference is due to a low-complexity region of variable length present in vertebrates, known as the inserted segment in human HSPB1.

Métodos

Homologous protein sequences were recruited from Uniprot KB database with Uniprot BLASTp, using the canonical human HSPB1 (UniProt ID P04792) as query. To ensure a comprehensive dataset, additional sequences were sourced by filtering according to specific taxonomic groups. The recruitment process focused on major kingdoms: Animalia, Plantae, Eubacteria, Fungi, and Protista. After recruitment, an initial filtering step was applied to remove sequences that were partial, hypothetical, or contained indeterminate residues. Duplicated sequences were also eliminated to avoid redundancy. The remaining sequences were aligned and then filtered based on percentage identity and sequence coverage. A threshold of 30% identity and 40% coverage was applied. From the resulting dataset, all HSPB1 sequences were manually curated to compile a set of orthologs HSPB1 sequences for subsequent analysis. The importance of working with proteins coded by orthologs genes is related to studying the evolutionary history of the product of a single gene, ensuring that these are the same protein in different organisms and, consequently, they perform the same function. Dataset characterization Two subsets were generated from the orthologs dataset, one for vertebrates and another for invertebrates. The three sequence datasets (the complete set, the vertebrate subset, and the invertebrate subset) were aligned using Clustal Omega within UGENE.

Palabras clave: Small Heat Shock Protein 1, Molecular evolution, Multiple Sequence Alignment, Orthologs

Previsualización destacada

Identificador del recurso

URI: http://hdl.handle.net/11336/248901

Colecciones

Datos de Investigación(SEDE CENTRAL)
Datos de Investigación de SEDE CENTRAL

Citación

Bravo, Facundo Nicolás Eric; (2024): Multiple Sequence Alignment of Small Heat Shock Protein 1 Orthologs. Consejo Nacional de Investigaciones Científicas y Técnicas. (dataset). http://hdl.handle.net/11336/248901

Condiciones de uso

Las buenas prácticas científicas esperan que se otorgue el crédito adecuado mediante una citación. Utilice un formato de citación y aplique estas normas de reutilización.

Excepto donde se diga explícitamente, este item se publica bajo la siguiente descripción: Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Unported (CC BY-NC-SA 2.5)