Mostrar el registro sencillo del ítem
dc.date.available
2024-11-28T13:28:22Z
dc.identifier.citation
Bravo, Facundo Nicolás Eric; (2024): Multiple Sequence Alignment of Small Heat Shock Protein 1 Orthologs. Consejo Nacional de Investigaciones Científicas y Técnicas. (dataset). http://hdl.handle.net/11336/248901
dc.identifier.uri
http://hdl.handle.net/11336/248901
dc.description.abstract
The curated dataset comprises 474 protein sequences, of which 199 are from invertebrates and 275 from vertebrates. During the curation process and after applying coverage and identity filters, no sequences from other kingdoms, including Plantae, Fungi, Eubacteria and Protista, remained in the dataset. Among the vertebrates, the dataset includes 88 mammals, 124 fish, 26 birds, 26 reptiles, and 11 amphibians. The MSA analysis shows that the ACD is highly conserved in length, with an average of 77.0 ± 0.5 amino acids, while the NTR (88.8 ± 18.3) is longer and more variable compared to the CTR (34.7 ± 8.4). The CTR lengths are similar between vertebrates (32.1 ± 8.1) and invertebrates (38.2 ± 7.5). In contrast, the NTR shows greater length variability between vertebrates (99.5 ± 16.6) and invertebrates (74.0 ± 6.5). This difference is due to a low-complexity region of variable length present in vertebrates, known as the inserted segment in human HSPB1.
dc.rights
info:eu-repo/semantics/openAccess
dc.rights.uri
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.title
Multiple Sequence Alignment of Small Heat Shock Protein 1 Orthologs
dc.type
dataset
dc.date.updated
2024-11-28T11:28:41Z
dc.description.fil
Fil: Bravo, Facundo Nicolás Eric. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
dc.datacite.PublicationYear
2024
dc.datacite.Creator
Bravo, Facundo Nicolás Eric
dc.datacite.affiliation
Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología
dc.datacite.affiliation
Consejo Nacional de Investigaciones Científicas y Técnicas
dc.datacite.affiliation
Consejo Nacional de Investigaciones Científicas y Técnicas
dc.datacite.affiliation
Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología
dc.datacite.affiliation
Consejo Nacional de Investigaciones Científicas y Técnicas
dc.datacite.affiliation
Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología
dc.datacite.affiliation
Consejo Nacional de Investigaciones Científicas y Técnicas
dc.datacite.affiliation
Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología
dc.datacite.publisher
Consejo Nacional de Investigaciones Científicas y Técnicas
dc.datacite.subject
Bioquímica y Biología Molecular
dc.datacite.subject
Ciencias Biológicas
dc.datacite.subject
CIENCIAS NATURALES Y EXACTAS
dc.datacite.ContributorType
DataCurator
dc.datacite.ContributorType
ContactPerson
dc.datacite.ContributorType
RelatedPerson
dc.datacite.ContributorName
Racigh, Vanesa Elizabeth
dc.datacite.ContributorName
Rodriguez Sawicki, Luciana
dc.datacite.ContributorName
Fornasari, Maria Silvina
dc.datacite.date
05/06/2024-10/06/2024
dc.datacite.DateType
Creado
dc.datacite.language
eng
dc.datacite.version
1.0
dc.datacite.description
Homologous protein sequences were recruited from Uniprot KB database with Uniprot BLASTp, using the canonical human HSPB1 (UniProt ID P04792) as query. To ensure a comprehensive dataset, additional sequences were sourced by filtering according to specific taxonomic groups. The recruitment process focused on major kingdoms: Animalia, Plantae, Eubacteria, Fungi, and Protista. After recruitment, an initial filtering step was applied to remove sequences that were partial, hypothetical, or contained indeterminate residues. Duplicated sequences were also eliminated to avoid redundancy. The remaining sequences were aligned and then filtered based on percentage identity and sequence coverage. A threshold of 30% identity and 40% coverage was applied. From the resulting dataset, all HSPB1 sequences were manually curated to compile a set of orthologs HSPB1 sequences for subsequent analysis. The importance of working with proteins coded by orthologs genes is related to studying the evolutionary history of the product of a single gene, ensuring that these are the same protein in different organisms and, consequently, they perform the same function. Dataset characterization Two subsets were generated from the orthologs dataset, one for vertebrates and another for invertebrates. The three sequence datasets (the complete set, the vertebrate subset, and the invertebrate subset) were aligned using Clustal Omega within UGENE.
dc.datacite.DescriptionType
Métodos
dc.subject.keyword
Small Heat Shock Protein 1
dc.subject.keyword
Molecular evolution
dc.subject.keyword
Multiple Sequence Alignment
dc.subject.keyword
Orthologs
dc.datacite.resourceTypeGeneral
dataset
dc.conicet.datoinvestigacionid
21442
dc.conicet.justificacion
Es un alineamiento de secuencias múltiple de secuencias de ortólogos de la proteína de shock térmico 1. Las secuencias fueron obtenidas a partir de búsquedas en bases de datos públicas. Tanto la recolección de los datos como el uso posterior es de uso para quien lo desee en cualquier lugar.
dc.datacite.formatedDate
2024
Archivos del conjunto de datos
Archivo
Notas de uso
Tamaño