Information Technology: Use of Statistical Measures in Research Essay Example | Topics and Well Written Essays

Running head: INFORMATION INFORMATION TECHNOLOGY-USE OF STATISTICAL MEASURES IN RESEARCH ___________ ________________________ ________________ Abstract Information Technology-Use of Statistical Measures in Research A variety of information is collected in very many organizations the world over for differing purposes. Information is collected normally in the form of data-numeric or otherwise. Information technology research papers also address some such purpose ranging from developing a computer security system, evaluating effectiveness of operating systems to devising new techniques for detecting hidden information in web graphic files etc. Systematic data crunching and mining has become important as it enables the unwieldy raw data to be put in form of quantified output to support argument or hypothesis contained in the main text narrative. Statistical measures are tools just to achieve requisite data crunching and mining. This paper studies the use of two of the most common statistical measures used in information technology research viz. measures of central tendencies and variability. Introduction Statistics enables the researcher in viewing the collected data in two ways. Descriptive statistics describes the shape of the data. Frequency and distribution are forms of descriptive statistics that help in this. Descriptive statistics uses measures such as mean, median, mode, correlation ,covariance etc.This data may be a sample or population data and we may have population mean compared to sample means etc. Inferential statistics attempts to fit a model to collected data and establishes causality .Inferential statistics also deals to develop predictive models which are based on causality analysis. In this paper mainly simple concepts of descriptive statistics are explored and inferential statistics is not touched upon. Statistical measures, not having real existence, simply support an argument or hypothesis and are just mental constructs. While statistics helps in summary organization of data, interpretation of the same, on its way to hypothesis, is the primary task of the researcher. Review of Literature Comparative cost of ownership analysis of Server Operating Systems was done with elaborate use of mean analysis and t test significance (Cahner, 1997). Mean and standard deviation model, multivariate model, Markov process model and time series model were used as part of statistical technique in developing Misuse Detection Systems (Christina, 1997).Statistical user profiles were used as part of multilayered security system (Steve, 1999).A combination of arithmetic mean, median and standard deviation gave sufficient support to help conclude on Survey results on Operating systems'(David, 1998). Discussion A basic primer of descriptive statistics is necessary not only for understanding such concepts but also for pointing to their specific use on research data. "The most frequently used average is the Mean, which is the balance point in a distribution. Its computation is simple - just add up the scores and divide by the number of scores.Formally mean is the value around which the deviations sum to zero.The formal definition also explains as to why informally one defines the mean as the balance point in a distribution. At mean value the positive and negative deviations balance each other out. A major drawback of the mean is that it moves in the direction of extreme scores. If in any two distributions most values are about same size however in one distribution one or two values are inordinately high then the mean of such a distribution would be pulled up greatly in comparison to the other distribution. This is a skewed distribution. For such skewed distributions, a different average, the Median, which is defined as the middle score is used. To get an approximate median, scores are put in order from low to high and count is made till middle score, which is set as median. The Mode is simply the score with highest frequency. The mode is sometimes used in informal reporting but is rarely used in formal research." (Danford, Module 1). Thus mean measures are to be used where data collected is prima facie representative with no outliner deviations so as to give out a mean value that is of use for decision making. Median is simply an indicative of the middle value of an ordered sequence of values (data) and can be used as such on any data set. Mode represents the most recurring value in a data set and thus is to be used in data sets where recurrence is expected and a conclusion about such recurrence is required. "The variance is a measure of how spread out a distribution is. It is computed as the average squared deviation of each number from its mean. The standard deviation (SD), a measure of spread, is simple derived as the square root of the variance. An important attribute of the SD, as a measure of spread, is that if the mean and SD of a Normal distribution are known, it is possible to compute the percentile rank associated with any given score. In a normal distribution, about 68% of the scores are within one SD of the mean and about 95% of the scores are within two SDs of the mean" (http://www.davidmlane.com). That is, in a Normal distribution, observations that have a standardized value of less than -2 or more than +2 have a relative frequency of 5% or less. (Standardized value is expressed as its difference from the mean, divided by the standard deviation.).If one has access to STATISTICA software, one can explore the exact values of probability associated with different values in the normal distribution using the interactive Probability Calculator tool provided in this package. Most of the data collected in information technology research is also assumed to follow normality and ,therefore, the SD can be effectively utilized along with mean to conclusions about spread of such data. A most common and widely used application of SD is the concept of Value-at-Risk (VAR).In this concept SD is used as a measure of risk and spread of normally distributed data and is contained in a measure called volatility.VAR can be applied effectively in various information technology research areas where cost concepts are involved and data assumes normality. Information technology research in areas like security systems for computers, surveys of internet usage, surveys on usage of operating systems, studies of web pages to determine as to which pages are used more frequently and by which categories of persons, studies in e-commerce and m-commerce, designing of WAN/LAN etc have used the statistical concepts of descriptive statistics in supporting inherent hypothesis. Such proven hypothesis led to commercialization and development of the concept/product/idea. An illustrative application of statistical measures in a core information technology research is carried below to drive home the point. Kimberly Patch writes, Steganography - the practice of hiding a secret message in written or audio information has turned important as computers and the Web generate huge volume of such digital information. Identifying such hidden information becomes a tedious task. A Dartmouth College researcher has devised a method based on statistical measures to detect hidden information in digital images, which can contain up to an MB of information, or more. Digital images are made up of pixels, or dots of color. Particularly in the high-resolution digital images that have one million or more different shades of color, it's easy to hide a message by slightly altering these colors in ways that are imperceptible to the human eye. In an untampered image, however, the information that makes up the image is not simply random. The key to the Dartmouth detection method was to create a statistical profile of the compressed data files that make up natural or undisturbed images, then checking a given image against the profile. In order to detect hidden messages in an image it was necessary to characterize the statistics of natural images. It was expected that when a message is hidden in an image, these statistics get disturbed. When images are compressed so they can be stored as smaller files, the digital information that indicates the color of each pixel is changed into wavelet information. Wavelet mathematics includes functions like spatial position, orientation and scale. Wavelets allow for compression because all the information that makes up a wavelet can be reconstructed from only a portion of that information. An image is compressed by storing only the portion that is needed to reconstruct the whole. Two types of wavelet statistics were collected: variations like mean, variance, skewness and kurtosis in the coefficients, or numbers that make up the wavelets, and information about the rate of errors that occur when reconstructing full wavelets from compressed information. The variation and error rate statistics were combined into a vector. By comparing the statistical vector information with the same information in an individual image it was easy to tell if the image had been disturbed with a hidden message (Patch, 2005). Concepts of statistics were core to examination of data of disturbed digital images in the above information technology research effort. The simple descriptive statistical measures such as the mean and variance were used to determine the statistical profile of disturbed images and compare them with control profiles to identify hidden information. In practice this method had important application in protecting copyrights in digital media, for secured military and intelligence communication, under cover criminal communications, trafficking of illegal pornography, and for the protection of civilian speech against repressive governments. This method was also found useful in detecting forgeries in art works. Conclusion Information Technology research uses statistical measures effectively in a variety of ways from research in development of systems to development of techniques. A wide array of statistical packages is available for advanced analysis like multiple regressions (linear or otherwise), t-tes,z-tes,chi-square test, ANOVA (one way and two ways), ANCOVA, correlation coefficient, simulation, electronic data collection systems etc. These can be studied further to entirely understand the concepts behind such statistical measures. Thereafter depending upon the data collected and hypothesis formed any of these techniques can be applied either on stand alone basis or in combination with others to support the information technology research hypothesis. References Cahners In-Stat Group White Paper. (April, 1997).Server Operating Systems: A Comparative Cost-of-Ownership Analysis. Retrieved February 16, 2006 from http://www.instat.com/mscoowp2.htm. Christina Yip Chung (December, 1997). A Survey of Misuse Detection Systems. Retrieved February 16, 2006 from http://seclab.cs.ucdavis.edu/chungy/doc/MDS.htm. Steve Schall (May, 1999). The Enemies Within: Building a Multi-Layered System Security. Enterprise Security. Retrieved February 21, 2006 from http://www.esj.com/article.aspxID=539921702PM. David Mathog (February, 1998). Operating system survey results. Retrieved February 21, 2006 from http://seqaxp.bio.caltech.edu/www/os_survey_results.html. HyperStat Online Textbook. Chapter 1 Introduction Variance and standard deviation Retrieved February 16, 2006 from http://davidmlane.com/hyperstat/intro.html. Danford L. Wilson. Understanding and Using Statistical Research Methodologies in Medical Education Programs: A Primer for Medical Students and Residents. Director for Graduate Medical Education University of Kansas - School of Medicine. Kimberly Patch. (August, 2005).Technology Research News. Statistics sniff out secrets Retrieved February 16, 2006 from http://www.trnmag.com/Stories/2001/092601/Statistics_sniff_out_secrets_092601.html Read More

Information Technology: Use of Statistical Measures in Research - Essay Example

Extract of sample "Information Technology: Use of Statistical Measures in Research"

CHECK THESE SAMPLES OF Information Technology: Use of Statistical Measures in Research

An Information Management

Statistical Differences Related to Migration Problems and Drug Use

Gender Differences and Factors that Affect Stock

Evaluation Research

Challenges Facing the Chinese Telecom Firms in Their Expansion Strategies

Role of Statistics in Politics

Parallel Statistical Computing For Statistical Inference

The Way That the Researcher Structures a Research Project