Major Calculations in Biostatistics Assignment Example | Topics and Well Written Essays

? BIOSTATISTICS ASSIGNMENTS (Q1-Q4) Q1. P value between first and second screening observations of presence is 0.967 and the p value of absence is 0.80851 this means that the p value is > than 0.05. So we can conclude the null hypothesis is right meaning the observations are statistically sme whether in first screening or second screening. The t test and chi square test were calculated to arrive at this hypothesis and the null hypothesis is thus retained. Question 2 1. The ODDS RATIO (usually abbreviated ?OR?) is one of three main ways to quantify how strongly the having or not having of the property A is linked with having or not having the property B in that population. As the name implies, to compute the OR, one follows these steps: 1) computes the odds that an individual in the population has ?A? given that he or she has ?B?; 2) Computes the odds that an individual in the population has ?A? given that he or she does not have ?B?; and 3) Divides the first odds by the second odds to obtain the odds ratio, the OR. If the OR is greater than 1, then having ?A? is ?associated? with having ?B? in the sense that the having of ?B? raises (relative to not-having ?B?) the odds of having ?A.? Odds = Number of cases with positive outcomes/Number of cases with negative outcomes in that group So we find one parameter 42/22= 1.9 for smokers While the other 85/62= 1.37 for non smokers The odds ratio is the ratio of the odds of the outcome in the two groups. The solution algorithms used Cases with positive outcome-with positive complications Top of Form Number in 1st group: a= Number in 2nd group: b= Cases with negative outcome-with negative complications Number in 1st group: c= Number in 2nd outcome: d= Results Odds ratio 1.3925 95 % CI 0.7559 to 2.5652 z statistic 1.062 P = 0.2881 Bottom of Form The odds ratio is the ratio of the odds of the outcome in the two groups. Odds ratio = (a/c) / (b/d) ___________________________________________________________________________ d. Odds ratio 1.3925 95% CI 0.7559 to 2.5652 z statistic 1.062 Significance level P = 0.2881 e. and f Mantel-Haenszel Method of Calculation of Effect Measure in Meta-analysis (For Randomized Control Trials) For more information, see Petiti, D. Meta Analysis, Decision Analysis and Cost-effectiveness Analysis Instructions: Add or Delete Studies Between the First and the Last Studies only weight(=s) Odds Ratio product F R Rsquared Ssqured G H 8.86 1.39 12.34 6.08 12.34 152.31 78.54 10.63 4.49 38.40 1.31 50.22 24.59 50.22 2521.66 1474.56 44.43 19.60 47.26 62.56 30.67 62.56 2673.96 1553.10 55.06 24.09 f g. Confounding effect: In statistics, a confounding variable (also confounding factor, hidden variable, lurking variable, a confound, or confounder) is an extraneous variable in a stats model that correlates (positively or negatively) with both the dependent variable(complication)and the independentvariable(smoker/non smoker). A perceived relationship between an independent variable and a dependent variable that has been mis-estimated due to the failure to account for a confounding factor is termed a spurious relation and the presence of misestimation for this reason is termed omitted variable bias.Due to the inability to control for variability of volunteers and human studies, confounding is a particular challenge in this case as the confounding effects may be due to environmental pollution. h. In statistics and mathematical epidemiology, relative risk (RR) is the ratio of the probability of an event occuring (for example, developing a disease, being injured) in an exposed group to the probability of the event occuring in a comparison, non-exposed group. Though as in our case study the RR of smokers of having complications is 98% than non smokers but it will be incorrect to assume the RR because confounding effects like exposure to other air pollutants can be confounding which if taken into account might have changed the RR ratio. Relative risk 0.9889 95% CI 0.8639 to 1.1321 z statistic 0.161 Significance level P = 0.8720 NNT (Harm) -201.316 95% CI 15.277 (Harm) to ? to 18.01 (Benefit) i. f. Multiple Linear Regression - Estimated Regression Equation detrac[t] = +9.25034375 petrac[t] +5.12519921875 letrac[t] -5.2814375 cetrac[t] +1608.062 + e[t] Multiple Linear Regression - Ordinary Least Squares Variable Parameter S.E. T-STAT H0: parameter = 0 2-tail p-value 1-tail p-value petrac[t] 9.250344 5513065852.1968 0 1 0.5 letrac[t] 5.125199 NAN NAN 0 0 cetrac[t] -5.281438 NAN NAN 0 0 Constant 1608.062 NAN NAN 0 0 Variable Elasticity S.E.* T-STAT H0: |elast| = 1 2-tail p-value 1-tail p-value %petrac[t] 5.656313 3371077241.6543 0 1 0.5 %letrac[t] 6.406451 NAN NAN 0 0 %cetrac[t] -4.175232 NAN NAN 0 0 %Constant 10.86488 NAN NAN 0 0 Variable Stand. Coeff. S.E.* T-STAT H0: coeff = 0 2-tail p-value 1-tail p-value S-petrac[t] 0.47995 286043160.31001 0 1 0.5 S-letrac[t] 7.881086 NAN NAN 0 0 S-cetrac[t] -2.740476 NAN NAN 0 0 S-Constant 0 NAN NAN 0 0 *Note computed against deterministic endogenous series Variable Partial Correlation petrac[t] NAN letrac[t] NAN cetrac[t] NAN Constant NAN Critical Values (alpha = 5%) 1-tail CV at 5% 1.14 2-tail CV at 5% 1.26 Multiple Linear Regression - Regression Statistics F-TEST 0.6677 Observations 2 Degrees of Freedom -2 Multiple Linear Regression - Residual Statistics Standard Error 5440.052607 Sum Squared Errors 14286844.340183 Log Likelihood -18.61958 Durbin-Watson 0.067168 Von Neumann Ratio 0.134336 # e[t] > 0 0 # e[t] < 0 2 # Runs 1 Runs Statistic 0 Multiple Linear Regression - Ad Hoc Selection Test Statistics Akaike (1969) Final Prediction Error -21430266.510275 Akaike (1973) Log Information Criterion 19.781703 Akaike (1974) Information Criterion 390017635.39275 Schwarz (1978) Log Criterion 17.167997 Schwarz (1978) Criterion 28573688.680366 Craven-Wahba (1979) Generalized Cross Validation 7143422.170092 Hannan-Quinn (1979) Criterion 1648952.560854 Rice (1984) Criterion -2381140.723364 Shibata (1981) Criterion 35717110.850458 Multiple Linear Regression - Analysis of Variance ANOVA DF Sum of Squares Mean Square Regression 3 -14264370.432143 -4754790.144048 Residual -2 14286844.340183 -7143422.170092 Total 1 22473.908041 22473.9080405 F-TEST 0.6677 p-value 0 Q3. 1. Odds ratio Top of Form Cases with positive outcome Number in 1st group: a= Number in 2nd group: b= Cases with negative outcome Number in 1st group: c= Number in 2nd outcome: d= Results Odds ratio 1.7705 95 % CI 0.7380 to 4.2477 z statistic 1.279 P = 0.2007 Bottom of Form The odds ratio is the ratio of the odds of the outcome in the two groups. Odds ratio = (a/c) / (b/d) Out of total population of 964 persons 764 were not cancerous and 201 had cancer. Total alcoholics were 920 and non alcoholics were 44 . Out of 920 total 201 people had cancer and 6 non alcoholics had cancer. This was the data fed into the above table. 2. The assumptions considered was to segregate to find out the total alcoholic patients which were detected with cancer and non alcoholics who were effected with cancer as far the command executed 3. Coefficients of Bias-Reduced Logistic Regression Variable Parameter S.E. t-stat 2-sided p-value (Intercept) -2.93455571787418 0.180284017808738 -16.2774035854217 0 `talcohol\r\r` 0.0257026043711433 0.00231153795941735 11.1192655376604 0 Summary of Bias-Reduced Logistic Regression Deviance 827.43514511717 Penalized deviance 810.42645681898 Residual Degrees of Freedom 962 ROC Area 0.763684554973822 Hosmera€“Lemeshow test Chi-square 18.8274970136892 Degrees of Freedom 8 P(>Chi) 0.0158099201169273 The H-L test - The Hosmer–Lemeshow test is a statistical test for goodness of fit for logistic regression models. It is used frequently in risk prediction models. The test assesses whether or not the observed event rates match expected event rates in subgroups of the model population. The Hosmer–Lemeshow test specifically identifies subgroups as the deciles of fitted risk values. Models for which expected and observed event rates in subgroups are similar are called well calibrated. The Hosmer–Lemeshow test statistic is given by: Here Og, Eg, Ng, and ?g denote the observed events, expected events, observations, predicted risk for the gth risk decile group, and n is the number of groups. The test statistic asymptotically follows a distribution with n-2 degrees of freedom. The number of risk groups may be adjusted depending on how many fitted risks are determined by the model. This helps to avoid singular decile groups Since the H-L test for goodness of fit should be >0.05 but in this case the value is 0.01 which is less than 0.05 the groups for each variable does not obey the fit model in the standard curve which signifies the population charecteristics changes when the exposure groups change in magnitude or direction 4. Multiple Linear Regression - Estimated Regression Equation age[t] = -0.020691264618861 tobacco[t] +0.027124667541005 talcohol[t] +51.03715471134 + e[t] Multiple Linear Regression - Ordinary Least Squares Variable Parameter S.E. T-STAT H0: parameter = 0 2-tail p-value 1-tail p-value tobacco[t] -0.020691 0.035174 -0.588247 0.556505 0.278252 talcohol[t] 0.027125 0.011501 2.35854 0.018546 0.009273 Constant 51.037155 0.817947 62.396622 0 0 Variable Elasticity S.E.* T-STAT H0: |elast| = 1 2-tail p-value 1-tail p-value %tobacco[t] -0.004737 0.008052 -123.599266 0 0 %talcohol[t] 0.027503 0.011661 -83.397571 0 0 %Constant 0.977234 0.015662 -1.453622 0.146378 0.073189 Variable Stand. Coeff. S.E.* T-STAT H0: coeff = 0 2-tail p-value 1-tail p-value S-tobacco[t] -0.019163 0.032576 -0.588247 0.556505 0.278252 S-talcohol[t] 0.076832 0.032576 2.35854 0.018546 0.009273 S-Constant 0 0 0 1 0.5 *Note computed against deterministic endogenous series Variable Partial Correlation tobacco[t] -0.018972 talcohol[t] 0.075863 Constant 0.895563 Critical Values (alpha = 5%) 1-tail CV at 5% 1.65 2-tail CV at 5% 1.96 Multiple Linear Regression - Regression Statistics Multiple R 0.076182 R-squared 0.005804 Adjusted R-squared 0.003735 F-TEST 2.804955 Observations 964 Degrees of Freedom 961 Multiple Linear Regression - Residual Statistics Standard Error 13.970379 Sum Squared Errors 187559.806718 Log Likelihood -3908.363909 Durbin-Watson 1.794946 Von Neumann Ratio 1.79681 # e[t] > 0 474 # e[t] < 0 490 # Runs 437 Stand. Normal Runs Statistic -2.956918 Multiple Linear Regression - Ad Hoc Selection Test Statistics Akaike (1969) Final Prediction Error 195.778875 Akaike (1973) Log Information Criterion 5.276986 Akaike (1974) Information Criterion 195.778871 Schwarz (1978) Log Criterion 5.292145 Schwarz (1978) Criterion 198.769291 Craven-Wahba (1979) Generalized Cross Validation 195.780771 Hannan-Quinn (1979) Criterion 196.912119 Rice (1984) Criterion 195.782679 Shibata (1981) Criterion 195.775095 Multiple Linear Regression - Analysis of Variance ANOVA DF Sum of Squares Mean Square Regression 2 1094.894527 547.447263 Residual 961 187559.806718 195.171495 Total 963 188654.701245 195.90311655744 F-TEST 2.804955 p-value 0.061005 QUESTION 4 1.Figure of Kaplan Meir survival Analysis: The Kaplan–Meier estimator also known as the product limit estimator, is an estimated function for estimating the survival function from lifetime data. In medical research, it is often used to measure the fraction of patients living for a certain amount of time after treatment. A plot of the Kaplan–Meier estimate of the survival function is a series of horizontal steps of declining magnitude which, when a large enough sample is taken, approaches the true survival function for that population. The value of the survival function between successive distinct sampled observations ("clicks") is assumed to be constant.An important advantage of the Kaplan–Meier curve is that the method can take into account some types of censored data particularly right-censoring, which occurs if a patient withdraws from a study, i.e. is lost from the sample before the final outcome is observed. On the plot, small vertical tick-marks indicate losses, where a patient's survival time has been right-censored. 2. This survival analysis means the more the treatment duration continues more time will be taken to revert to drugs once gain. From the analysis of the two graphs it reflects the longer the treatment period more will be the survival and more time would be required to revert to drug addiction. This means the treatment group are more viable to remain for a drug free period Survival time time Endpoint censor Factor codes treat 3.Significance P = 0.0365 The result output shows that there is a significant difference between the survival curves of two groups analyzed because p value is < 0.05. this means probability of the survivals happening due to chance(null hypothesis) is rejected while it means the alternative hypothesis of survival curves being different is definitely due o duration of treatment periods. 4. Hazard ratio and 95% CI Factor 0 1 0 Hazard Ratio= 0.8283 (CI= 0.6933 to 0.9895 ) 1 Hazard ratio=1.2074 (CI=1.0106 to 1.4424) In a survival analysis the hazard ratio(HR) is the ratio of the hazard rates corresponding to the conditions described by two levels of an explanatory variable. For example, in a drug study, the treated population may die at twice the rate per unit time as the control population. The hazard ratio would be 2, indicating higher hazard of death from the treatment. Or in another study, men receiving the same treatment may suffer a certain complication ten times more frequently per unit time than women, giving a hazard ratio of 10. Hazard ratios differ from relative risk ratios in that the latter are cumulative over an entire study, using a defined endpoint, while the former represent instantaneous risk over the study time period, or some subset thereof. Hazard ratios suffer somewhat less from selection bias with respect to the endpoints chosen, and can indicate risks that happen before the endpoint. The Hazard Ratio indicated that people on short duration treatments are very likely to return to drugs (1.2074) while those who are on long duration of treatment they are less likely to return to the menace for drug addiction and hence more effectively to remain in the drug free states(0.823) and the CI means the range of validity of this probability score with respect to the population in terms of population lower and upper limits.. 5. The assumptions made are 1. More the rehab(treat 1=long duration) more one is likely to stay from the drug free habit(censor=0) 2. While less the rehab(treat 0=short duration) more one is likely to revert to drug addiction(censor=1) Read More

Major Calculations in Biostatistics - Assignment Example

Extract of sample "Major Calculations in Biostatistics"

CHECK THESE SAMPLES OF Major Calculations in Biostatistics

The Uses of Spatial Analysis In Organization

Participant Observation (Business Research Method)

Use of Quantitative Methods for a Healthcare Organization

Geographical Information System: GIS

Primer of Biostatistics

Epidemly and biostat

Some of the Routine Applications of Descriptive Statistics

Participant Observation, Social Desirability and Operational Definition