Practical Issues in Bioinformatics Assignment Example | Topics and Well Written Essays

? Health sciences and medicine, Assignment Topic: Bioinformatics; CW1 – Database work 26th October Partial DNA sequence for a gene that your company is interested in: CGGCGCCGCGAGCTTCTCCTCTCCTCACGACCGAGGCAGAGCAGTCATTATGGCGAACCTTGGCTGCTGGATGCTGGTTCTCTTTGTGGCCACATGGAGTGACCTGGGCCTCTGCAAGAAGCGCCCGAAGCCTGGAGGATGGAACACTGGGGGCAGCCGATACCCGGGGCAGGGCAGCCCTGGAGGCAACCGCTACCCACCTCAGGGCGGTGGTGGCTGGGGGCAGCCTCATGGTGGTGG A short report telling them what data is publicly available for this gene. 1) Using NCBI BLAST identify the most likely candidate for the complete gene. a. What is the name of the gene? Homo sapiens prion protein (PRNP) gene Gene ID: 5621 PRNP Query coverage of 100% E value of 2e-21 – [BLASTN 2.2.27+] (Zhang et al., 2000) b. What organism does the gene comes from? Homo sapiens [Humans] Most of the query results were from Homo sapiens (human) thereby providing a likelihood that the partial DNA sequence could have probably originated from humans. c. In trying to find the function of a gene it can be useful to see how widely distributed amongst species. For example is it limited to bacteria? From the BLAST output what can you say about the distribution of the gene amongst different species? The prion protein gene is not limited to humans and related species (primates); Sumatran orungatan (Pongo abelii) and Macaca fascicularis are also primates. Albeit the gene we obtained was from a primate, prion gene can also be found in other mammals, such as sheep and cattle. 2) Using the secondary databases find out as much as you can about the functional and structural properties of the gene. Why is this gene significant? What does it do? What does it look like? Where is it found within the organism? Are there any related genes? Accession CAA58442 Amino acid sequence of prion protein [Human] MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQGGGTHSQWNKPSKPKTNMKHMAGAAAAGAVVGGLGGYMLGSAMSRPIIHFGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNFVHDCVNITIKQHTVTTTTKGENFTETDVKMMERVVEQMCITQYERESQAYYQRGSSMVLFSSPPVILLISFLIFLIVG The gene exists as a single copy and encodes a membrane glycosylphosphatidylinositol-anchored glycoprotein. It is located on chromosome 20. This protein of molecular weight – 26884.3 has a primary structure made up of 245 amino acids and a theoretical pI (Isoelectric point) of 9.13 (Gasteiger et al., 2005). Monomeric form (C) of the gene product is alpha-helical in structure albeit misfolding of this protein gives rise to a protease resistant form (PRPN (Sc) and is usually anchored on the cell membrane via a lipid anchor. The protein is of importance due to its implication in the etiology of human and livestock disease where its malformation may lead to neuronal degeneration. . Figure 1: NMR solution structure of the human prion protein (Zahn et al., 2000) The gene is involved in synaptic plasticity and neuronal development; it may also play roles in the uptake of iron and homeostasis. Related genes i). RNA-binding protein FUS isoform 1 [Homo sapiens] ii). TATA-binding protein-associated factor 2N isoform 1 [Homo sapiens] iii). Chain A, Mouse Prion Protein (121-231) Containing The Substitution F175a iv). Single-stranded DNA-binding protein [Arthrobacter sp. Rue61a] v). Hypothetical conserved protein [Oceanobacillus iheyensis HTE831] vi). Translation initiation factor IF-2 [Corynebacterium glutamicum R] 3) Is this protein related to any diseases? What are they? What causes them? Are there any mutations of the gene associated with diseases? The misfolding of the prion protein results to a variant of prion protein (PrPc) associated with various neurodegenerative diseases collectively termed transmissible spongiform encephalopathies (TSEs) or prion-related diseases (Taylor et al., 2009; Prusiner, 1998). Upon misfolding of the prion protein, there is a significantly large increase in the ?-sheet content of the protein. This causes the proteins to aggregate into large macromolecules. Prion proteins associated diseases include bovine spongiform encephalopathy (BSE or mad cow disease) in cattle, scrapie in sheep and Creutzfeldt-Jakob (CJD) disease, fatal insomnia, Gerstman-Straussler-Scheinker (GSS) disease and variably protease-sensitive prinopathy (VPSPr) in humans. These diseases result either from genetic, sporadic, or even due to infection. These proteins though devoid of any nucleic material are transmissible and are entirely constituted of the transformed protein (Sc). Among the prion-related diseases, CJD is the most common disease in humans, and sporadic events have been associated with about 85% of the incidences of CJD (Torres et al., 2012). However, it should be noted that some 10% cases (familial CJD) may be associated with mutation of the prion gene and a further less than 1 % are as a result of infection. Therefore, according to these statistics, majority of CJD cases are a result of sporadic events; genetics (familial) and infections contribute just 11% of all the total incidences of CJD. Normal prion protein may also be associated with neuroprotective function in the cerebral spinal fluid (CSF). Some studies have shown that this protein plays roles in signal transduction, cell survival, and protection against oxidative stress (Watt and Hooper, 2005; Chen et al., 2003; Mouillet-Richard et al., 2000). Therefore, in prion disease the conversion of normal prion to the abnormal form may have negative effect on CSF and thereby contribute to the progression of the disease. In humans, the prion gene contains an octapeptide repeat region (R1-R2-R2-R3-R4). R2, R3 and R4 repeats encode octapeptides -PHGGGWGQ while R1 encodes octapeptide – PQGGGGWGQ. This region contains 5 repeats of 24-27 bp, a nonapeptide and 4 octapeptide coding sequences. According to Li et al (2011), a rise in the number of these repeats or even a decrease is linked to prion diseases. Point mutations on the prion gene, such as E200K, P102L have been associated with familial CJD and GSS respectively (Kong et al., 2004). Other mutations, such as insertions and deletion, have also been linked with familial prion diseases since they lead to a reduction of the repeats. In conclusion, the presence of a mutation in the prion gene renders the resultant gene product more likely to adopt the abnormal prion protein conformation. 4) Describe how you carried out your investigation, including what databases you used and giving reasons for why you used those resources and supporting the evidence you gave for parts 1-3. Give you opinion as to the usefulness and accessibility of the different databases that you used [UniProt, Prosite etc.] I submitted the provided partial DNA sequence to a BLASTN available at the NCBI website at http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&BLAST_PROGRAMS=megaBlast&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on&LINK_LOC=blasthome. BLASTn is a basic local alignment search tool algorithm which uses a DNA sequence as a query to search a nucleotide database. In order to increase the probability of getting the correct result, the BLASTn was optimized for highly similar sequence using megablast. The NCBI has several nucleotide databases; thus, it was imperative to choose the appropriate database for conducting the BLASTn searches. Since no prior information was given on the source of the partial DNA sequence, the search was conducted on a general curated collection of nucleotide collection. The non-redundant (nr) collection was used for the search. This is a collection of nucleotide databases that are manually curated; it increases the chances of finding the correct sequence from the search. In the search, the choice of the most likely DNA which matched the query was based on the query coverage and the E-value which were displayed in the result. The best sequence had the smallest E-value and large query coverage. The query coverage is an indicator of how well the query stretches along the length of a given sequence in the database. Most of the sequences from the databases had query coverage of 100% and E-values that were significant. Most of the sequences, however, were from human and had a common gene ID. This scenario is possible because different groups may have submitted different version of the human prion gene to the nucleotide collection. Sequences in the non-redundant collection are curated; therefore, there is higher probability of getting the correct matching sequences. After getting the DNA sequence, sequences from other organism other than humans were picked as the related sequences. The key indicators of the related sequences to the sequence that was picked were based on their E-values and query coverage. Using the selected human prion protein DNA sequence as a query, a BLASTp search was carried out in the NCBI databases. The nr protein databases were selected. The BLASTp algorithm searches a protein database using as a protein query. Physical properties of the primary sequences from the protein BLAST were investigated by submitting the primary sequence of the prion protein to Protoparam tool available at the Expasy server. The physico-chemical properties included the isoelectric point and the molecular structure. Using the accession number of the protein, the sequences were submitted in the Protein Data Bank, a biological macromolecular resource that archives experimentally-derived structures of biological molecules. The actual sequence can also be submitted at the Protein Data Bank. This helped in providing the structure of the prion protein showing its alpha chains (Figure 1). The results displayed for each query for both NCBI nucleotide and Protein BLAST contain links to the literature paper associated with most sequences submitted in the databases. These papers can be accessed at PubMed, and they provide detailed information of the sequences, including the logic behind sequencing the sequences, related sequences and any information a researcher may be interested in. This is where information on prion proteins and the diseases associated with their malformation was sourced. In conclusion, in the recent past there has been an increase publicly available biological data and almost any sequence sequenced in the laboratory may find its match in these databases. Information on the objectives of the research group who sequenced the DNA can also be accessed easily through links found in the results. However, redundancy in this data is a common problem that brings confusion. Thus, as one searches a sequence of interest in the database, clear judgment is necessary since redundancy exist in the form of hypothetical sequences, or even very short sequences that will match the query sequence even though the sequences do not come from a related source. Statistics alone, therefore, may not be enough in selecting which sequence matches another; rather, careful scrutiny of literature resources backing up the given sequence is necessary. Redundancy may also stem from hypothetical sequences. Hypothetical sequences are derived from translation of submitted sequences by the machine. Reference Chen, S., Mange, A., Dong, L., Lehmann, S. and Schachner, M. (2003) ‘Prion protein as trans-interacting partner for neurons is involved in neurite outgrowth and neuronal survival’ Mol Cell Neurosci. 22 pp. 227-233. Gasteiger, E., Hoogland, C., Gattiker, A., Duvaud, S., Wilkins, M.R., Appel, R.D. and Bairoch, A. (2005) ‘Protein Identification and Analysis Tools on the ExPASy Server’ In: John M. Walker (ed.) The Proteomics Protocols Handbook. Totowa, NJ: Humana Press. pp. 571-607. Kong, Q., Surewicz, W.K., Petersen, R.B., Zou, W., Chen, S.G., Gambetti, P., Parchi, P., Capellari, S., Goldfarb, L., Montagna, P., Lugaresi, E., Piccardo, P. and Ghetti, B. (2004) ‘Inherited Prion Disease’ In: Prusiner S. (ed.) Prion Biology and Diseases. 2nd ed. New York: Cold Spring Harbor Laboratory Press. pp. 673–776. Li, B., Qing, L., Yan, J. and Kong Q (2011) ‘Instability of the Octarepeat Region of the Human Prion Protein Gene’ PLoS ONE. 6(10): e26635. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0026635 [accessed 27 October 2012]. Mouillet-Richard, S., Ermonval, M., Chebassier, C., Laplanche, J.L., Lehmann, S., Launay, J. M. and Kellermann O. (2000) ‘Signal transduction through prion protein’ Science. 289 pp. 1925-1928. Prusiner, S. B. (1998) ‘Prions’ Proc Natl Acad Sci USA. 95 pp. 13363–13383. Taylor, D.R., Whitehouse, I.J. and Hooper, N.M. (2009) ‘Glypican-1 Mediates Both Prion Protein Lipid Raft Association and Disease Isoform Formation’ PLoS Pathog. 5(11): e1000666. http://www.plospathogens.org/article/info%3Adoi%2F10.1371%2Fjournal.ppat.1000666 [accessed 27 October 2012]. Torres, M., Cartier, L., Matamala, J.M., Hernandez, N., Woehlbier,U. and Hetz C. (2012) ‘Altered Prion Protein Expression Pattern in CSF as a Biomarker for Creutzfeldt-Jakob Disease’ PLoS ONE. 7(4): e36159. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0036159 [accessed 27 October 2012]. Watt, N.T. and Hooper, N.M. (2005) ‘Reactive oxygen species (ROS)-mediated beta-cleavage of the prion protein in the mechanism of the cellular response to oxidative stress’ Biochem Soc Trans. 33 pp. 1123-1125. Zahn, R., Liu, A., Luhrs, T., Riek, R., Von Schroetter, C., Garcia, F.L., Billeter, M., Calzolai, L., Wider, G. and Wuthrich, K. (2000) ‘NMR solution structure of the human prion protein’ Proc.Natl.Acad.Sci.USA. 97(1) pp. 145-150. Zhang, Z., Schwartz, S., Wagner, L. and Miller, W. (2000) ‘A greedy algorithm for aligning DNA sequences’ J Comput Biol. 7(1-2): pp. 203-214. Read More

Practical Issues in Bioinformatics - Assignment Example

Extract of sample "Practical Issues in Bioinformatics"

CHECK THESE SAMPLES OF Practical Issues in Bioinformatics

Bioinformatics and molecular modelling

Bioinformatics in cancer therapy

Web technologies.From PHP to Python

Innovation, Creativity and Enterprise in the Scottish Life Sciences Industry

Bioinformatics and Molecular Modelling

Bioinformatics of Bt Cry Toxins

Bioinformatician in an NHS Clinical Genomics Unit

Meaning of Human Genome Project