Free

Large-Scale Data Processing Technique - Research Paper Example

Summary

The paper "Large-Scale Data Processing Technique" highlights that the design hypothesis would be ideal and appropriate for this study, since, data will be collected and evaluated using varying methods and strategies that would result in completing overlapping and non-overlapping strengths…

Download full paper File format: .doc, available for editing

GRAB THE BEST PAPER96% of users find it useful

Read Text Preview

Subject: Information Technology
Type: Research Paper
Level: Ph.D.
Pages: 4 (1000 words)
Downloads: 2
Author: welchhulda

Extract of sample "Large-Scale Data Processing Technique"

? HYPOTHETICAL DESIGNS Envisioned Research Problem A popular, large-scale data processing technique that has been extensively utilized in recent times for tasks that need direct human input is crowdsourcing. Crowdsourcing, according to Howe and Robinson (2006), refers to a novel online business model and problem-solving technique that utilizes the creative capabilities of a distributed pool of individuals through an open call (Brabham, 2008). Popular examples include Inn-Centive, Threadless, the Golcorp Challenge, Netflix, user-generated advertising competitions, Amazon Mechanical Turk, and iStockphoto among others (Brabham, 2008). The fact that crowdsourcing is dependent on large, distributed; global network of individuals, raises a set of new challenges. These challenges, which include dishonesty and plausible misjudgments, threaten the quality of results obtained through this process. Certain measures have, however, been put in place, to ensure high quality, and error free results. There is little or no attention given to the efficiency and throughput of the crowdsourcing process or the integrity of the results obtained. It is argued that the numbers of task workers and tasks are always small, thereby, resulting in crowdsourcing techniques that are not conscious of the number of tasks, potential worker behavior and efficiency of the process. This research aims at proposing a crowdsourcing, result-improvement technique that is independent of task complexity and sizes and ensures result quality, integrity, as well as efficiency and throughput of the process. The hypothesis being studied in this case is that crowdsourcing result-improvement techniques that are task size and complexity independent ensures result integrity, quality, efficiency as well as throughput. Hypothetical Designs Quantitative design This design will utilize experimental research method - methods that aim at maximizing replicability, generalizability, and objectivity of results; mostly concerned with prediction (Creswell, 2009). The focus will be to test several existing crowdsourcing techniques including r-Redundancy, v-Voting, and Vote Boosting techniques on a large number of tasks that will be handled by a large number of users. These techniques will be considered as experiment participants. The independent variable in this case would be the crowdsourcing techniques, including r-Redundancy, v-Voting, Vote Boosting techniques and the technique that this research will propose. The independent variables will be studied in two level; low task, less complex task level, and high number of tasks, and complex tasks level. Consequently, the dependent variables will be integrity, quality, efficiency as well as throughput. The experiment will be set in such a way that, the tasks set for testing, will have two definitive parameters including the accuracy of the tasks initial states and the number of options available per decision. 8 sets of 100,000 tasks with 3, 4, or 5 options and 75%, 85%, and 95% as the accuracy for the initial set tasks. There are about 4 to 10 decisions distributed normally. The user network or population tested also has two parameters including mean probability for committing errors and for dishonesty. Values of 3%, 6% and 20% are used for both dishonesty and making of errors. These probabilities were distributed exponentially over [0, 1] around their mean values. Simulations for about 40 input-aggregation functions with each one receiving one input are run repeatedly. For this experiment, the proposed quantitative hypothetical design is deemed to be extremely expensive even in the event that only few points in a parameter space are covered. Qualitative Design In this case, this research method will aim at understanding and discovering the perspectives, thoughts and experiences of previous researchers and participants in the same field in order to understand reality, purpose and meaning (Trochim & Donnelly, 2008). The focus will be to review and evaluate literature on previous and current research work on several existing crowdsourcing techniques including r-Redundancy, v-Voting, and Vote Boosting techniques on a large number of tasks that will be handled by a large number of users. In this case, the independent variable would be the crowdsourcing techniques, including r-Redundancy, v-Voting, Vote Boosting techniques and the technique that this research will propose. The dependent variables will be integrity, quality, efficiency as well as throughput. Thorough evaluation will be conducted while proper consideration will be given to dishonesty, errors, efficiency or results and on throughput. This will help understand purpose, reality and meaning and make an informed decision. This design will not be appropriate for this study because, it will be difficult to test the proposed crowdsourcing technique against the existing ones since there is no literature or case studies to this effect. Additionally, it is difficult for the researcher to ignore their own perceptions, biases and experiences and pretend to objective while reviewing and evaluating previous case studies and literature on existing techniques. Mixed Methods Design This design combines both quantitative and qualitative research methods with the aim of ostensibly bridge their differences with the aim of meeting the objectives of the research (Research methods: Design of investigations, 2004). The independent variable in this case would be the crowdsourcing techniques, including r-Redundancy, v-Voting, Vote Boosting techniques and the technique that this research will propose. The dependent variables will be integrity, quality, efficiency as well as throughput. This design will focus on proving whether or not, crowdsourcing result-improvement techniques that are task size and complexity independent ensures result integrity, quality, efficiency as well as throughput. This will be achieved through experimental design in, which simulations will repeatedly be done on the existing techniques, to ascertain their level of efficiency, throughput, and results integrity. This will be complemented with a review and evaluation of literature on the existing crowdsourcing techniques including r-Redundancy, v-Voting, and Vote Boosting techniques on a large number of tasks that will be handled by a large number of users. The results obtained will then be compared to the simulation results of the technique that this research will propose. This design hypothesis would be ideal and appropriate for this study, since, data will be collected and evaluated using varying methods and strategies that would result in completing of overlapping and non-overlapping strengths and weaknesses. It will complement the weakness of quantitative, experimental design method that is; it’s extremely expensive nature. Additionally, this design is complementary, pluralistic, and inclusive (Eysenck, 2004). References Brabham, D. C. (2008). Crowdsourcing as a Model for Problem Solving. The International Journal of Research into New Media Technologies, 14(1), 75–90. Creswell, J. W. (2009). Research design: Qualitative, quantitative, and mixed methods approaches (3rd ed.). Thousand Oaks, CA: Sage Publications. Eysenck, M. W. (2004). Psychology: An International Perspective. New York: Taylor & Francis. Research methods: Design of investigations. (2004). Psychology Press Ltd. Trochim, W., & Donnelly, J. (2008). The research methods knowledge base (3rd ed.). Mason, OH: Cengage. Read More

Cite this document

APA
MLA
CHICAGO

(“Synthesis of Research Problem, Method, and Design-Hypothetical Designs Paper”, n.d.)
Retrieved from https://studentshare.org/information-technology/1461846-synthesis-of-research-problem-method-and-design

(Synthesis of Research Problem, Method, and Design-Hypothetical Designs Paper)
https://studentshare.org/information-technology/1461846-synthesis-of-research-problem-method-and-design.

“Synthesis of Research Problem, Method, and Design-Hypothetical Designs Paper”, n.d. https://studentshare.org/information-technology/1461846-synthesis-of-research-problem-method-and-design.

Cited: 1 times

CHECK THESE SAMPLES OF Large-Scale Data Processing Technique

Data Mining: Concepts and Techniques

Therefore it is scientific that a true data mining software application or technique must be able to change data presentation criterion and also discover the previously unknown relationships amongst the data types.... This report "data Mining: Concepts and Techniques" discusses data mining clustering analysis as has been done above finds data object clusters that are identical in a way or another based on their features and possibility to be integrated to achieve a single outcome or result....

12 Pages (3000 words) Report

Data Warehouses with Big Data

According to Ricardo (2011), traditional form of database administration is based on the fundamental technique of identifying and manipulating the different characteristic entities in a given dataset.... With the advent of Big Data, the database management industry has now developed 'increased volume, velocity, and variety' of data storage, retrieval, and even processing systems (Baru et al 2013, p.... The author of the paper concludes that while traditional data warehousing still puts emphasis on the use of forms and queries, advanced data warehouses are based on data tables that are extensively interlinked with each other in a multidimensional environment....

5 Pages (1250 words) Term Paper

Digital fingerprinting

The two techniques show differences in fingerprinting methods as the first patent deals with identification of fingerprints and its applications and the second technique deals with transmission and storage of fingerprints through a mobile device.... The structures and imaging techniques and the functions and processing and transmission of images are discussed, along with a comparative analysis of two relevant techniques.... The study will use data from patents on equipment and methods used for digital fingerprinting and analysis and two such patents for fingerprinting techniques are analysed here....

14 Pages (3500 words) Essay

Customer Data and Analysis

he hardest of the four levels to explain is interval level data.... atio data.... Fed every second by Tesco's 12 million Clubcard holders, the Crucible database could in theory generate about 12 billion pieces of data a year if each cardholder bought just 20 items a week....

5 Pages (1250 words) Essay

Implementing and Managing Large Databases

These weaknesses can be compensated in various ways: • If the hardware used has a small disk, a compression technique can be used.... A database is a large collection of data organized in a systematic way.... Databases ensure that computer programs, select pieces of data in an organized manner.... They represent the manner in which end-user data is stored, accessed, and managed.... DBMS eliminates data inconsistency inherent in the file system....

7 Pages (1750 words) Essay

Digital Image Processing Techniques

This report "Digital Image processing Techniques" discusses techniques that can be successfully used to delineate vegetation cover.... Thus the reliability of image processing techniques is sufficiently established.... Press OK to start processing.... ext, the image data for Tunisia will be analyzed to determine if desertification is occurring....

8 Pages (2000 words) Report

Data Exploration and Processing

That is why it is often necessary to perform a variety of pre-processing activities to provide the best dataset before it is used for such modeling.... Some of the pre-processing techniques that can be employed when handling any given data set include discretization, binning, linearization, and normalization (Kamiran & Calders, 2012).... data Pre-processingWhen doing data modeling using such a large and diverse dataset, it can be hard to get the true picture of what it really depicts and the trends that are associated with it....

7 Pages (1750 words) Assignment

NoSQL and Big Data Management

This paper will focus on the NoSQL technique.... NoSQL defers from the other databases that apply SQL technique which stores data in table format and data schema designed carefully before building the database (Cattell, 2011).... The paper "Big data Management" is a great example of a report on management.... Big data can be described as large volumes of data that are both structured and unstructured.... The concept of being data has evolved in the current century with most companies striving to deal with the data (WhatIs....

8 Pages (2000 words) Article