Data Management in Cloud Environments Research Paper Example | Topics and Well Written Essays

DATA MANAGEMENT IN CLOUD ENVIRONMENTS of the of the Executive Summary Cloud computing has gained its importance in thepresent scenario by giving an opportunity to the end users in terms of accessing wide pool of information. However the most vital component which forms the basis of such operations but is often overlooked is database management system in cloud environment. This research study would focus on identifying the different data storage systems belonging to both the framework of SQL and noSQL. The study not only outlines the different data storage engines but even brings forth the contrasting elements in such data management systems. There has been research conducted in this particular field which forms the framework for future research study. However the major reason behind selecting such research topic is that it would help to gain sufficient knowledge regarding multiple ways to store data in cloud environment. This study would even outline the challenges which are witnessed by the data stores and the appropriate ways through which most of such challenges are eliminated from the system. It can be stated that this research study deals with the intricacies which are associated with such data management system. The study would even be supported by some practical examples in order that audience gain insights in terms of internal functioning of cloud computing. Research Plan Research Aims and Problem Statement In the past few years the increasing trend of computational power has also overwhelmed the data flow. It can be stated that recent advances that are associated with web technology has enabled users to store content of various sizes which is considered to be data management in cloud environment. There has been a paradigm shift in mechanism of large scale data processing and computing infrastructure. Cloud computing sets forth the provision for such computing infrastructure. This research study would aim at clearly distinguishing the various data management techniques in cloud environment. The main aim of the study can be further subdivided into identifying different data storage engines both SQL as well as noSQL. This study would not only focus on determining different data engines but would even encompass their respective pros and cons, performance issues and scalability between such data management engines. The research problem statement would be – “Identification of different data storage engines in cloud environment and outlining performance and scalability between these data management types.” This problem statement or research question would be addressed throughout the analysis. The major objective of the research study would be to highlight the working procedures of such SQL and noSQL data storage engines. In earlier research study on cloud environment such comparison between noSQL and SQL data management techniques has not been established which would be accomplished through this particular research study. It would comprise of certain key challenges associated with scalability, economical processing, and consistency of large volumes of data in cloud environment. Significance According to Ohlhorst (2013), technological advancements and proliferation of mobile devices and sensors connected to Internet has given birth to large volumes of data which needs to be stored as well as processed. RDBMS or traditional database management system which was designed in the earlier times possessed different hardware, processing and storage requirements in comparison to the requirements for DBMS in present scenario. This shift in requirements has posed a lot of challenges to scaling requirements and performance of reality of Big Data. Big Data basically means complex and massive data sets made of different data structures such as semi-structured, structured, as well as unstructured. On the other hand Big Data can be defined on basis of 3Vs such as velocity, variety and volume. Cloud computing has emerged as a new paradigm that provides network access on demand for wide range of computational resources (Ohlhorst, 2013). As per the words of Cattell (2011), the infrastructure leveraging of cloud environment requires application implementation, appropriate designing and database management system. There are several factors which need to be possessed by data management system in cloud environment. Firstly is high performance and scalability as in present scenario all the applications are witnessing continuous growth in terms of end users being served, data which needs to be stored, and overall throughput. Then is the capability to operate on commodity heterogeneous server as it is where cloud environment is based, elasticity, fault tolerance, availability, privacy and security measures are other factors which are critical for such database management system (Cattell, 2011). The traditional RDBMS is not capable enough to address such criteria’s or to handle Big Data and hence different solutions are developed in the past few years to meet the major areas of concern. The specialized solutions as stated by Tudorica & Bucur (2011), for such data storage are referred to as noSQL and SQL data stores and they represent themselves as alternatives for data processing which can provide scalability as well as manage huge volumes of data. Appropriateness of such data stores and presence of immense existing solutions makes it more tedious to set forth a perspective at domain and challenges arise in the form of selecting the best solution for a problem. It is often difficult for any application to store or process massive data sets and in order to enhance convenience, data is partitioned and these partitions are stored across several server nodes. These partitions are further replicated across multiple servers so that there is no loss of data in case of failure of servers. There are many data stores such as Big Table and Cassandra which utilize this approach to implement scalable and highly available solutions that can be effectively leveraged in cloud environments. However the CAP theorem highlights the restrictions which are applicable for such replicated network data stores and other solutions. As per this theorem only two amongst three properties that are consistency, availability and partition tolerance can be effectively met at the same time by such data systems (Tudorica & Bucur, 2011). According to Pokorny (2011), the factor consistency reflects the single up to date data instance. This factor is not similar to that of consistency of RDBMS which refers to maintaining the state of database in same manner at all times. Availability factor denotes that data should be available to meet the request as and when required and the last element of partition tolerance refers to capability of data systems in terms of tolerating network partitions. CAP theorem can be interpreted in the form that data stores are usually partitioned into two respective sets of partitioned nodes in which if data set denies all possible written requests then data would not be available but will remain consistent. On the contrary if write requests are accepted by both or one of the partitions then data store would tend to become potentially inconsistent but available (Pokorny, 2011). The term noSQL was first coined in the year 2009 by Johan Oskarsson and it basically referred to open source, non-relational and distributed databases. As per the analysis of Bughin, Chui & Manyika (2010), these data stores state that SQL querying style is not important for designing such data storage engines. noSQL is considered to be the best approach for cloud based data management system. However there are many enterprises which do not incorporate such technique due to continuous usage of low level query language, huge investments in SQL infrastructure and absence of standardized interfaces. This research study would help to eliminate the gap in existing literature in terms of evaluating both noSQL and SQL data stores (Bughin, Chui & Manyika, 2010). There is clear evidence in relation to different data storage engines belonging to both categories but there is lack of proper analysis on the scalability and performance related factors of such data stores. Research Accomplishments till Date There exist certain research accomplishments which have been done in this particular field of study. This prior research analysis majorly comprise of various data storage engines and their respective characteristics. It even encompasses opportunities and challenges that are associated with such data stores. There are certain specific categories of noSQL data stores and it is majorly divided into four broad categories such as column-family stores, graph databases, key-value stores and document stores. NewSQL which possess certain features similar to that of SQL is considered to be a hybrid between relational databases and noSQL stores. Key-value stores comprise of a simple data model that are based on pairs of key-value. The key is used to identify the value and simultaneously retrieve and store the value out and into the data store. This value is considered to be opaque to data store and is basically used to store different types of data such as string, integer, array or object which provides a schema free data model. These stores are not only schema free but are highly efficient in storing distributed set of data. However these stores are not productive in case of structures or scenarios which require relations. Such functionality would require implementation in client application that would directly interact with key value stores. In these stores since value is opaque it can only perform queries through keys but cannot manage data level indexing and querying. There are certain key value stores which store data in memory such as Redis and Memcached, and also some of these stores maintain data on the disk such as Voldemort, Riak and Berkeley DB. On the other hand column-family stores are usually derived through BigTable of Google. In this mechanism the data is stored in column-oriented manner. BigTable comprises of a data set that has several rows and each of these rows has a specific row key which is often termed as primary key. Each of the rows in the table encompasses column families and it can also be stated that different rows may possess different columns. In context of key-value stores the row keys majorly represent the key and column families can be considered to be the value for such keys. On the other hand column families can be treated as a key to one or several columns where each of them comprises of a name value pair. The concepts of Google BigTable are directly implemented by Hadoop HBase and DynamoDB and Amazon SimpleDB possess a different data model. The latter comprise of only one name value pair of column in each row and do not encompass any column families. It can be stated that in comparison to key value stores, column-family stores is far more efficient in querying and indexing because they are totally based on column families and posses columns along with row keys. Document stores are another form of noSQL data storage engine and can be regarded as a derivative of key value data storage model as it utilizes key in order to locate various documents within the data. These stores generally represent documents which are based on JSON or JavaScript Object Notation or any similar format which is derived from such notation. Document stores are usually beneficial for those applications where input data is usually represented in document format. It may comprise of data sets which is complex by nature like nested objects and does not belong to any fixed scheme. These stores provide flexibility in the form of indexing documents through document contents and primary key. This querying capability and indexing based on contents of the document differentiates this data storage model from key value stores. Graph databases generally utilize graphs in their entire data model and are based on the original graph theory. These databases have the capacity to store relationships efficiently between several data nodes. In graph databases, edges and nodes posses’ individual properties comprising of key value pairs. They are specialized in managing interconnected data and are highly efficient in traversing relationships amongst various entities. These databases are highly suitable in scenarios like pattern recognition, social network applications, solving problems put across by navigation systems related to path finding and recommendation systems. Neo4J can b considered to be an efficient graph database in terms of handling relationships and graphs (Abiteboul, Manolescu, Rigaux, Rousset & Senellart, 2012). However existing graph databases are always not productive in relation to horizontal scaling as nodes are stored on multiple servers and accessing each one of them is not efficient in terms of performance. On the other hand SQL solutions are associated with relational model. Clustrix, VoltDB and NuoDB have the capacity to provide relational view of dataset to all the clients. On the contrary Google Spanner is typically based on semi-relational model where tables are considered to be mappings from primary key columns to rest of the columns. Security can be considered to be major factor which is often overlooked by noSQL data stores as the main focus is on improving the overall infrastructure for data management systems. Data authentication should be incorporated in the data storage machines where user’s identity can be effectively verified when they are accessing data. The major ways to sustain data security is in the form of encryption, authorization, auditing, etc. There are three important security points like for instance when the data is at rest, server to server connections and client to server connections. SQL data stores are majorly used for those applications which are based on transactions that manipulate objects or possess consistency requirements. Financial market can be regarded as an example of such database management system where money transfer is done and needs both the accounts to get automatically updated. In such scenario applications need to view same instance of the database which is possible through such SQL servers. The researcher possesses competency in the field of information technology which gives an opportunity to analyze different data storage system. On the other hand inclination towards understanding the internal work procedure of cloud computing provides required insights to the research study. Methods The aim of the research study is precise and requires effective design and planning in order to achieve the research goals and objectives. Research methodology can be considered to be the most essential pillar of the entire study. There are various techniques which are used as research methods in order to accomplish the set objectives. However selection of an appropriate method totally depends on the nature of the study. In this particular research study the focus is on analyzing the different data storage engines and to compare these database management systems so as to highlight the best measure in critical situation. This study would be based on primary and secondary research procedures. In context of primary research techniques the tools which would be incorporated are questionnaire survey and focus group study. Both of these tools would be applied on IT professionals as they are the ones who can gives the best possible answers related to database management in cloud environment. The major advantage of focus group study is that it enables the researcher to analyze different point of view before reaching to a conclusion. On the other hand the loophole in this technique is that at times researcher interferes in the group discussion which disrupts the entire process or the group members may not provide correct information due to confidential terms. Questionnaire survey is beneficial as the data which is obtained can be statistically approved. However the survey is time consuming and often respondents do not take active participation in the process. The other method which would be incorporated is secondary research in the form of analyzing existing articles, journals, books, internet, etc., so as to derive valuable information related to the research topic. These techniques would be efficient as different opinions and views would help to attain the research aim of comparing and contrasting different database management systems in cloud environment. References Abiteboul, S., Manolescu, I., Rigaux, P., Rousset, M.C., & Senellart, P. (2012). Web Data Management. New York: Cambridge University Press. Bughin, J., Chui, M., & Manyika, J. (2010). Clouds, big data, and smart assets: ten tech-enabled business trends to watch. McKinsey Quarterly. Cattell, R. (2011). Scalable SQL and NoSQL data stores. ACM SIGMOD Record, 39(4), pp. 12-27. Ohlhorst, F. J. (2013). Big data analytics: turning big data into big money. New Jersey: John Wiley & Sons. Pokorny, J. (2011). NoSQL Databases: a step to database scalability in Web environment. Int J Web Info Syst. 9(1), pp. 69-82. Tudorica, B. G., & Bucur, C. (2011). A comparison between several NoSQL databases with comments and notes. IEEE, 5(1), pp. 15-23. Read More

Data Management in Cloud Environments - Research Paper Example

Extract of sample "Data Management in Cloud Environments"

CHECK THESE SAMPLES OF Data Management in Cloud Environments

The Big Data Challenges

Modern Cloud Computing

The Application of Cloud Computing in Businesses

Efficiency of Cloud Computing Data Centers

The Main Security Issues and Aspects in Cloud Computing Based Technology Arrangement

Cloud Computing Model for Business

Cloud-Based System for S-mart Retail Company

Information Security in Cloud Computing