Reverend Thomas Bayes Theorem Essay Example | Topics and Well Written Essays

Bayesian Theorem Introduction Reverend Thomas Bayes developed Bayes’ theorem of probability. The theorem provides understanding about the how the probability of a theorem is affected by a new set of evidence. It is applied to in a variety of context to explore a relationship between theory and evidence. Contemporary, the theorem’s application is broad, ranging from mathematics to the field of science. It explains the relations between two theories and evidences. It allows the researcher to determine the relation between current beliefs with respect to previous beliefs and evidences. Simon Jackman (2009) defines Bayes’ theorem as ‘a theorem that illustrates conditional probability of the set on the given observed outcome, that is obtained from the knowledge of the probability and its outcome (Jackman, 2009)’. The rules of Bayes’ theorem are based on the basic axioms of probability or conditional probability. It expresses subjective depress of beliefs explaining the repletion through Bayesian statistic fundamental. The mathematical representation of theorem is as follow: P (A) is a prior probability, or occurrence of event A P (A I B) conditional probability in which the probability of A is given that B occurs. In other words, the B has a variable dependency but A is not dependent on B P (B I A) is also a conditional probability with the given B that A occurs P (B) probability of B to occur Bayesian statistical method provides in depth understanding about the events. The application of theorem is wide in various fields and subjects, such as, science, biology, mathematics, finance etc. The model is applied to determine relation between the events. In the field of finance the Bayesian method is adopted for financial forecasting. One of the major advantages of Bayesian theorem is the consideration that is given to the previous information. The fact is that many statisticians would disregard previous information in order to prove the objectivity of the current statistics. The Bayesian theorem proves objectivity of the statistics by combining both the sets of information. A very significant advantage noted for the usage of Bayesian theorem is that it provides direct probability statement. This is considerably the best way to interpret confidence interval. On comparison, one can easily find out that frequents statistics would imply a number of tools. On the other hand, Bayesian approach makes use of only one theorem i.e. Bayesian. The fact remains as that Bayesian approach can be used in different situations where most of the tools of frequentist statistics fall short. Likelihood, Prior Posterior In the Bayesian theorem, the conditional probability occur on the bases of unconditional probabilities that are derived using a multiplication rules, that is (Prior x Likelihood) that are further divided by the sum of the possible parameters. Posterior in Bayesian theorem can be identified as conditional probability of the random event or uncertain proposition. It occurs when there is assigned or relevant evidence or background is considered (Jackman, 2009). Posterior is a random variable or conditional that is based on the evidence that is obtained from the experiment. In Bayer’s theorem the relevant case evidence are considered for a particular case. Likelihood can be defined as a conditional probability or the unobserved events of B and A has occurred based on given B. Likelihood is given by P (A I). It shall be noted that the likelihood is a function that is dependent or defined on the events of B and the likelihood is the weight that is given to the events of B depending upon the occurrence of A. Subjective probability and its application in Bayer theorem The subjective probability mainly the assumption or the summary of believes that are observed to occur. In Bayesian statistic, subjective are collection of the sum possible parameters that are considered before the data. These predictions or beliefs are often said as Prior subjective that provides the occurrence of event between the possible parameters, through providing possible values to the data. However, there are mainly two probability concepts that are applied in different ways in Bayes’ theorem that are objective probabilities and subjective probabilities. Jackman (2009) illustrates the application of subjective probability as ‘probability that corresponds to personal beliefs that are rational and have coherences constrains with respect to probabilities. The probability are based on the parameters of beliefs, therefore it is essential that the parameters should be defined or calculated on the bases of its degree of belief due to which it important that the parameters should be subjective, in order to revise our beliefs about the parameter in the given data. In addition, the beliefs that shall be considered among the parameters of probability should be based on previous events or experiences. However, the difference among the beliefs of other exist, that is subjective, considering that the flipping of a coin is the fair enough because the probability of occurring and head or tails is equal, whereas other belief that the coin appears to be asymmetrical due to which it is predicted that he probability of occurrence of head is 0.49 on the bases of his past experiences. Beta-Binomial The beta-binomial distribution can be mentioned as “it is the simple binomial distribution in which the probability of success is random and not fixed. Moreover, it specifically follows the beta distribution” (Bolstad, 2004). The Bayesian theory uses the beta- binomial distribution more frequently. As it contain both distributions such as binomial distribution and beta distribution. Thus, both are explained below with the examples (Haug, 2012). Firstly, the binomial distribution will be discussed in which it can be defined as “the probability of occurrence of any event” or “the number of successes in n number of sample size or trials and the probability of the occurrence of success is denoted by p”. In order to explain the binomial distribution with the help of example the three situations will be useful to understand its common characteristics (Bolstad, 2004). Those three situations are coin tossing, drawing with the replacement and random sampling from large sample size of population. While tossing coin n times, the number of success can be counted by the occurrence of getting head. There are n number of trials and in each trial can be successful or a failure. The proportion of the population can be denoted by π. The binomial distribution contains two common characteristics and those are n and π. However, in the binomial distribution the occurrence of success in the n trials is denoted by k (Bolstad, 2004). The formula of binomial distribution is mentioned below. The beta distribution which is the conjugate distribution same as prior distribution is the part of binomial distribution where the beta (a, b) is called the beta distribution. The overall formula can be represented as follows. G(x; a, b) = k × Where, the most important part is that is use to determine the shape of the curve, and the only constant element is k. After combining both distributions to get the final equation of beta-binomial distribution the formula can be written as Maximum likelihood and Maximum a posterior The Maximum Likelihood Estimate (MLE) is an estimation technique that is used in Bayesian Statistics to determine parameters of maximum probability of the observations. The MLE is a probability model that is used for the data, as an optimizing joint of likelihood function of the observed data based on one parameter or more (Jackman, 2009). In other words, it can be determined that the estimates parameters are consistent within the observed data rather than the other parameter in the parameter space. It shall be noted that the likelihood of the functions are not conditional because the parameters are not random variables (Jackman, 2009). The parameters that are defined are fixed but are not known. The maximum likelihood is obtained through maximizing probability of parameters of observed samples. Assuming that the set of probability distribution parameters as θ, dataset is D in the following Bayes’ Equation: p (θ|D)=p(D|θ)∗p(θ)p(D) Posterior = Example of Maximum Likelihood Estimate On the bases of the above equation, we seek to attain point value of θ that maximizes the likelihood p (D|θ). The value is denoted with the value θ that is an estimate but not a random variable. It can also be derived from the above equation that the Maximum Likelihood Estimate treats as a constant, means the projections or our beliefs are not injected and the value of θ are the likely value (Jackman, 2009). Maximum a posterior probability (MAP) In the Bayesian estimate it shall be identified that the posterior distribution are the estimates denoted by p (D | θ), where the θ is random variables. In posterior distribution, the probability is based on dense function it is more general as compared to MLE because it is not merely associated to maximize the Bayesian analogue of the likelihood. The Maximum a posterior probability (MAP) estimates are the mode of the posterior distribution, in which the point of estimate is observed through the unobserved events or quantity. The estimates are dependent upon the basis of empirical data. However, the MAP is often related to Fisher’s method of maximum likelihood, but it differs because it implies optimization objectives that are prior distribution that is based on the quantity that an individual wants to estimate. The MPA estimates are based on the point estimations that are often characterizes on the bass of the summarized data’s distribution the limit of it ranges (under 0-1 as a loss function) and these estimates are optimized under squared error and linear error. However, posterior does not has a simple analytic form due to which other techniques, such as, Monte Carlo are also implied for optimization the modes. Example to prove the difference between MLE and MAP The concept of Maximum Likelihood (MLE) and Maximum a posteriori (MAP) is confusable. The reason is that both contain some similarities and some differences. MLE provides the specific information and the MAP provides general information. The MLE estimates parameters that are consistent within the observed data whereas, MAP does not estimate the parameters. However, MAP estimates the different modes of the posterior distribution, in which the point of estimate can be observed through the unobserved events. The estimates of MAP are dependent upon the basis of empirical data unlike MLE (Stauffer, 2007). The maximum likelihood estimate MLE is basically a parameter and it is use to estimate the value of the parameter that maximizes the likelihood. Whereas, the maximum a posteriori MAP is used to estimate the value of the parameter that can maximizes the entire posterior distribution. However, MAP is somewhat dependent on likelihood because it can be calculated by using the likelihood. Moreover, a MAP estimates the different modes of the prior distribution. It can be observed that there is no big difference between the MLE and MAP only if the estimation of the prior distribution is constant (Stauffer, 2007). To prove the difference between MLE and MAP the example can clear the concepts about both maximum likelihood estimation and maximum a posteriori. For this, the better example can be regression. The reason to choose this example to clear the concept is that it is easy to understand. The problem starts with the fitting of some function that is given by the samples t with a linear combination of a set of basic functions. Y (x;w)=∑wϕ(x) Choosing Prior distribution It is clearly explained in the text book that before choosing the prior distribution one should match its prior belief. Moreover, the probability statement about the prior parameters must be clearly interpreted as “degree of belief”. The reason to call it degree of belief is that each individual have its own prior, it contains the possible parameter value. However, there are three different ways to choose the prior distribution such as subjective, objective and informative, and non-informative. The appropriate prior distribution must be subjective (Leonard & Hsu, 1999). The reason behind choosing the subjective ways of prior distribution is that the prior can be determined subjectively. The appropriate subjective prior distribution expresses the probability on the basis of personal experiment. It is said that the incorrect or inappropriate selection of prior can lead to incorrect inferences. The subjective prior is defined by the probability that is highly subjective (Bolstad, 2004). In order to use the Bayesian theorem the prior distribution is needed to be used. It gives the belief about the parameter and its possible values before choosing it from the data. Moreover, Bayes’ theorem can be summarized by the posterior because it is the proportional to the prior times of the likelihood. Here the prior distribution means a parameter of prior distribution that represents the probability distribution. It is the uncertainty about the parameters before the examination of the data. Therefore, it is said that using the prior distribution is the most sensitive aspect for the analysis of the Bayesian theorem (Savchuk & Tsokos, 2011). Methods of Estimation There are different methods of estimations that can find out the important elements of the estimation. Those main points of estimation are point estimator and interval estimation. The methods of estimation have various impacts on the hypothesis testing as well. It can be stated that the methods of estimation follow evaluation designs broadly (Haug, 2012). Point Estimator In statistics the point estimation is the process of finding the values of different parameters. However, when it comes to accuracy the point estimation process is not known for accuracy but for the particular approximation. Though, the probability for the accuracy in the approximate results is high but, it is still not commonly use in the experiments (Haug, 2012). There are several methods can be used in the point estimation methods. Therefore, it is desired that the point estimation should be consistent, unbiased, and most efficient. The commonly used methods for the point estimator are maximum likelihood method, differential calculus to determine the probability. In order to calculate the single value from the point estimation method, the use of sample data is most commonly used in the statistics. While comparing the point estimator to the Bayesian point estimation, it can be said that the Bayesian inference is based on the posterior distribution. There are many Bayesian point estimators that are used as a basic element of the posterior distribution. In the posterior distribution the central tendency is also a bit different. The central tendency of the posterior distribution is posterior mean, posterior median and posterior mode (Leonard & Hsu, 1999). Moreover, in Bayesian theorem a simple rule is used to determine the point estimator that is also known as frequentist estimator. It is use to estimate µ, that is the population mean and a parameter of the sample analog which can be estimated by successfully using the sampling distribution. It is the most important criteria that the estimator must be unbiased. Thus, frequentist estimator is denoted as µˆf = ¯y and it is an unbiased estimator of µ (Bolstad, 2004). Interval estimation The interval estimation is the contrasted with the point estimator. It is defined as a range of different values in which a required parameter can be determined because of its high probability. The interval estimation is use to estimate the confidence intervals for frequentist inference in the case of the Bayesian inference. The positive point of the interval estimation is that it can calculate the possible interval values from the unknown set of population parameter. It is distinct from the point estimation and it is identified as the estimation by the intervals. The intervals can be chosen as the parameter falls with the high probability, the intervals that are commonly used in estimation are called confidence intervals. As discussed intervals are use to analyze the parameters from the random population sample size but, it is derived from the probability theory (Haug, 2012). In the Bayesian theory, an interval estimation the confidence intervals are use to determine the parameter µ. In addition, there is the high probability in the interval estimation to contain the true value. After using the frequentist estimation the interval estimation parameters are known as constant parameters. Moreover, it is covered by the sampling distribution which provides it high probability for the accuracy. The interval estimation is simply derived by using the all possible values and samples that could be obtained. Within the Bayesian theorem after the frequentist estimation the confidence interval may be formulated as estimator ± critical value × standard deviation of estimator. Hypothesis testing for frequentist and Bayesian to compare between them A hypothesis test is a statistical method that is use to make the decisions from the give data set. Hypothesis testing is a best way to protect the credibility of the data. The frequentist inference can be contrasted by a null hypothesis as well as an alternative hypothesis can also be contrasted with it. In order to estimate the hypothesis test with the frequentist inference a level of statistical significance should be selected. The frequentist interpretation of probability considers the long run frequencies to make it more essential. It is required to estimate the probability in the given data set especially in the Bayesian inference (Savchuk & Tsokos, 2011). The Bayesian hypothesis testing estimates the probability with the help of hypothesis testing from the given data sets. In the Bayesian theorem the Bayes’ factors can be use to compare the hypothesis. However, it is quite difficult to calculate the results. It is said that the frequentist inference is the best interpretation of the probability that is dominating the Bayesian hypothesis testing. The reason behind this is that it has the easiest method to estimate the results from the given data sets. It is a fact that the Bayesian inference is coherent whereas, frequentist inference is incoherent. While comparing the frequentist and Bayesian hypothesis approaches there will be frequentist one-sided and two-sided hypothesis testing with null and alternative hypothesis. Then, in the Bayesian hypothesis there are one-sided and two-sided hypothesis test about the parameters. The frequentist hypothesis test is use to divide the sample space in to a region of rejection or acceptance. However, the Bayesian hypothesis testing is more essential to test the credibility of the data and observe that whether it is in the Bayesian credible interval or not (Bolstad, 2004). Chapter 14 summary In this chapter the Bayesian inference is used to robust against a specified prior. It can be found out from the analysis of the chapter that the use of a mixture of conjugate prior helps to robust the mis-specified prior. The posterior probability of misspecification likelihood is enormous and the distribution of posterior will be depending on the likelihood. From the experiments given in this chapter it can be analyzed that before looking at the data the prior parameter helps to pick up the prior data from the past data from the experiments. Sometimes, this misconception leads many people that both experiments are similar and gives the same results thus the data results will be same. However, it is not true because the prior can be mis-specified. Knowing about the prior and having the information about it is not enough for the precise likelihood. The posterior will give a high probability to the data and its values that is not supported by the prior. However, if any difficulty while choosing in between prior and data is being faced by the any one then, the selection should be ended by choosing the data (Bolstad, 2004). In this chapter a new indicator of random variable is being introduced. In which, an original prior is mis-specified and it is indicated by a small prior probability. In order to make prior more useful the mixture prior is also being used in this chapter. The mixture prior that is being used in this chapter is P(I = 0) × g0(θ) + P(I = 1) × g1(θ), where g0 and g1 are indicating the original prior. Moreover, it can be said that it is the best example of widely spread prior. In addition, the joint posterior of distribution is also being used in which I and θ are representing the given data that are extracted from the experiments. It is essential to evaluate the data with the help of these parameters. The formula of joint posterior can be written as Furthermore, by the help of marginalizing the data indicator variable is found out. It introduced that the marginal posterior distribution of θ is found out by summing up analyzed data of the joint posterior (Bolstad, 2004). The mixture index variable is a difficult parameter to analyze but it can be marginalized out with the help of Bayes’ theorem. The These mixtures of the priors are very robust to control the mis-specified prior. However, if the prior is initially correct then the mixture of the posterior result will be supporting the original data. In addition, the posterior probability will be small if the likelihood of the values is far from the required and original prior. Thus, the mixture posterior will be the best selection for the likelihood. It will provide the satisfactory results. If there is any confusion found in the selection from prior or likelihood then it is essential to choose likelihood because of its significance as it is based on the data. Sometimes, the prior contains faulty data and cannot analyze the change in the current and old data results. Therefore, it weakens the process and the reliability of the data. Thus, the use of likelihood can resolve the conflicts and the differences between the original prior data and the likelihood by giving priority to the likelihood (Bolstad, 2004). After summarizing the entire chapter fourteen it can be said that the robust Bayesian methods used mixture priors. It shows that how the mi-specified priors can be protected by using any prior distribution or mixture prior distribution. As it is the main concern of people today therefore, they use the Bayesian statistics and its different methods to control the priors and likelihood of the data. Moreover, use of Bayes’ theorem on the mixture prior to determine a mixture posterior is playing an essential role in the current scenarios. Conclusion While concluding the overall information regarding the Bayesian theorem it can be said that is playing a major role in the field of statistics. It is being drastically used by the statisticians for testing the hypothesis. Within the one Bayesian theorem there are many different methods that are using to acquire the accurate data results. In most of the cases Bayesian theorem is the best source of getting the positive results with high probability. It is also been noticed that Bayesian and frequentist approaches can give similar results. The MLE and MAP in the Bayesian theory also increase the chance of getting the high probability results. Moreover, the prior distribution also plays a crucial role for gagging the parameters within the Bayesian theory. List of References Bolstad, W.M., 2004. Introduction to Bayesian Statistics. 1st ed. New York: John Wiley & Sons. Haug, A.J., 2012. Bayesian Estimation and Tracking: A Practical Guide. 1st ed. New York: John Wiley & Sons. Jackman, S., 2009. Bayesian Analysis for the Socia Sciences. New Jersey: John Wiley & Sons. Leonard, T. & Hsu, J.S.J., 1999. Bayesian Methods: An Analysis for Statisticians and Interdisciplinary Researchers. 3rd ed. New York : Cambridge University Press. Savchuk, V. & Tsokos, C.P., 2011. Bayesian Theory and Methods with Applications. 1st ed. New York: Springer Science & Business Media. Stauffer, H.B., 2007. Contemporary Bayesian and Frequentist Statistical Research Methods for Natural Resource Scientists. 1st ed. New York: John Wiley & Sons. Read More

Reverend Thomas Bayes Theorem - Essay Example

Extract of sample "Reverend Thomas Bayes Theorem"

CHECK THESE SAMPLES OF Reverend Thomas Bayes Theorem

Riverbend City: Public Relations Mission

Hero-Worship and the Heroic in History

Critical Analysis Paper

Statistical Decision Making in Uncertainty

The Dignity of Human Life

Interpreting Landscape Representation

Decision Analysis for Management Judgment

Philosophy and the Dignity of Human Life