Database Design for Existing Dataset and Analysis Business Plan Example | Topics and Well Written Essays

Database Design for Existing Dataset and Analysis Location Table of Contents 3 Hardware RequirementsSpecifications 5 Technical Hardware Specifications 5 Introduction 7 Relational Databases 8 UML Database Diagram Design 11 Database Analysis 12 First Normalization 12 Second Normalization 13 Third Normalization 14 Conclusion 15 References 15 Abstract Linked data platforms have been increasingly used for the last half of a decade as many people realized that they could now obtain any information they needed over the internet. This forced the developers to find a way in which to relate data so as to give a results-oriented data delivery design. This database design project explores the implementation of a similar project that can delivery data to the end user and at the same time be able to generate reports when queried. The database is expected to use the existing linked data platform to be able to realize its objective seamlessly. Requirements for implementation of the System Database Support Software With regards to the data that will be used on the database, inclusive of the retrieval requirements of this data, a number of criteria were noted. These criteria have to be met by the database software. The software’s minimum requirements are as follows: 1. It should allow for SQL scripts that are stored to run. A number of processes can be automated by the used of stored scripts. Editing, querying, updating, facility management among many other processes is examples of these processes. This control becomes even more important if user access is being done through the internet. It, therefore, becomes easier to both call and run a stored script than to code a script before running a process. 2. The software should be a programmed in a way that allows the restrictions on the values of data that are entered into the table columns. 3. Multiple index creation should also be easily implementable with this software. A single index should be allowed to be reused in many columns. This will allow for quicker querying and sorting process for the many parameters that the database presents. The prevention of duplicate data across many columns is achieved by the creation of duplicate data. 4. Simultaneous multiple user accesses to tables should also be allowed. This is because data will be accessed from a number of locations via the web browsers. Concerning this, it is, therefore, be very important for users to be allowed to have simultaneous access to data. 5. The software has to have a relational database model support. This standard is provided in the industry. It helps in the integration with in other technologies that are present in the various locations that this data will be accessed from. 6. The software should allow for the creation of various views on the entered data. This will allow for a minimal amount of data to be stored, with the rest being placed in an unlimited number of virtual outputs. Through views, calculated values are easily viewed while at the same time negating the need for any additional columns in the created data tables. Besides this, views allow the creation of customizable data views through the linking of multiple tables. 7. The software has to have a replication method across the servers it rides on. This is so since the data is going to reside on two different servers. 8. The software should allow for data table triggers. This process enables the predetermination of actions that are to be taken in the case of deletion or entry of information into data tables. 9. The software should allow for the entry of data over the internet. 10. The software has to run in a Linux environment. Based on all the requirements that are listed above, the most suitable software combination was found to be Linux’s LAMP package and MySQL. These software products fully meet the requirements that this implementation needed. MySQL was found to have a fast web service data processing. The other option that could as well be utilized is the use of PostgreSQL, which is open source and saves heavily on costs that would be incurred in the implementation process of this project. Hardware Requirements Specifications The server will be used be used both as the database and web server. A chassis that will accommodate the LTO tape drive that is internally placed has been chosen to realize this. This server is accessed via the internet. The hardware and the software have similar configurations. The hardware manufacturer has to be globally known and highly reputable. Technical Hardware Specifications These are the detailed requirements that the server should meet. The specifications are primarily determined by the database size, project life expectancy and the user numbers. It was assumed that the project has a life expectancy of 5 years. The specifications are as outlined below: Item Description Side Bus (Front) 300MHz Memory Expandable to 8 DIMM sockets with board configurable to up to 6GB Hard Drive 5 GB with 6000 rpm Diskette Drive 1.5 MB Monitor 14 inch Optical Drive CD-ROM/DVD ROM Graphics Card 12MB RAM with integrated controller Cache 1024 KB Expansion Slots 4 (2 X 64bit/100MHx, 2 X 64bit/133MHx) Processors Dual Core 1.8GHz Mouse Scroll wheel, left button and right button Keyboard Standard Network Adapter Load Balancing Support; Failover support; Up to 100mbps connection speed; Dual Port. Hard Drive Backplane On board RAID allowing for 5 drive connections RAID Controller Embedded RAID; Can handle RAID 5 and RAID 1.k Memory 1.5 GB 512 MHZ Secondary Controller Compatible with Tape’s Backup Unit Power Supplies 1000 watts Voltage: 200 VAC Chassis Tower Item Description Operating System Ubuntu Linux 10.04-14.04 versions Updated Drivers NIC OS Documentation Ubuntu 14.04 Linux Documentation Software for Management Administration Access Remote management Automatic Diagnosis Management Features Facilitate installation, configuration and system set-up Paging or email support Alarms for faults management System resource monitor Inventory Server resource management RAID Controller Array management Environmental Parameters 25o C average operating temperature 12% - 75% operating non-condensing relative humidity 7% - 90% operating non-condensing storage humidity UPS Stand-alone 2100VA/1200W UPS 200 V to allow for 45 runtime minutes with half-load. Its features: Frequency: 40/50 Hz +/- 2Hz for auto-sensing 2.5 hours recharge time Leakproof Replace or repair warranty (2 years) Noise Filtering Zero clamping Optional EPO 360 joules energy surge rating User Manuals Installation Guides Table 1: Hardware Technical Specifications Introduction As consumer data content demands keep changing, there has been a growing need for developing systems that are adaptive in nature. That is, systems that can be molded easily in order to fit the consumer’s needs. This has also been the case for systems that allow access to the internet. The integration of different technologies has become increasingly important when addressing the constantly changing consumer demands on these platforms. This is a design project whose main aim is to develop a database that can provide services seamlessly. Linking of data has become very necessary due to the dynamic demands of clients in how they want to access data. An example is BBC’s data content provision system. The system links BBC’s audiences with its applications by enabling access to these applications. A number of access protocols are used to achieve this. The database design project covered in this project uses a number of accepted protocols to realize a database that acts as a linked data platform. This project will realize a database design whose scope is not limited to a geographical area since it can be accessed through web interfaces using PHP and running on an Apache Web Server. The data are stored in MySQL on a turnkey LAMP server running Oracle’s Virtualbox. The design will allow for easy data access and entry across a number of separate places and continents. The main driving force behind the implementation of this project was to realize a design that could be efficient, easily normalized, scalable and have a high ease of both data access and data entry. By doing this, it shall have achieved the main objective of this design process. Relational Databases Relational databases are the item of discussion in the entire paper. As a matter of fact, the database that will be designed is a relational database. This type of database was the result of a presentation that was made by Edgar Codd in 1970. It is from this paper that the basis of the biggest relational database company, Oracle Corporation, would be formed. The system was independent of any platforms, which gave it advantage over all other existing database systems. Relational database systems are realized through a series of records of tables that have specific attributes. These attributes are then linked with other tables through the sharing of primary keys that are referred to as foreign keys when shared. With such a system, the various items in different tables can have access to one another depending on the permissions outlined in the sharing procedure. The primary key uniquely identifies a given item in a table. This data storage approach proved to be more efficient than any other because it significantly reduced the amount of disk space that was required and at the same time increased the speeds of access to database records. The relational database is manipulated using the structured query language. This is commonly referred to SQL. SQL was formalized by the American National Standards Institute (ANSI) in 1986. The most recent revision of SQL happened in 2011. The language contains a direct relations algebra and is very user-friendly. Database servers that use the relational database system are, for instance, the Microsoft SQL servers and the Oracle servers. Microsoft Access comes with many features of a relational database. This is, for instance, the ability to use SQL in the implementation of its projects. However, it is not necessarily a relational database software system. Understanding BBC Ontologies Data Organization Prior to the launch of the SQL Server 2008, storage and management of unstructured data was very tricky. Unstructured data refers to data that has no predefined organization. This kind of data is normally text heavy and is the most commonly used data form in carrying and relaying information. This data could be stored as an IMAGE or in VARBINANRY form. The result of this was that this would ensure consistency in transactions and reduce complexities that are presented due to management. However, this leads to reduced performance. The process often used storage of this data in disk fields that would then be linked to structured data presented a lot of complexities and overhead inconsistencies. This is the same case for data that was to be accessed over the internet. Since unstructured data was the most accessed data, it presented a lot of complexities in its accessibility. BBC provides this kind of data to many of customers spread across the world. This brings in an issue of both efficient and seamless content in spite of the data structure or the client location in the world. BBC can do this via its ontologies covering different areas that its customers may find interesting. Its audience can enjoy BBC Education, BBC Sport, BBC Music, BBC Concepts, News Projects among many other things through this ontology that are provided by BBC. These ontologies are the basis of Linked Data Platforms, the subject matter of this paper. The advantages that ontologies present to a business entity are numerous. One of the most notable advantages is that they can be expanded with the business requirement and the consumer needs. With such a platform in place, a business can be able to invest confidently in more content production while having the guarantee that this product will be delivered to the consumer efficiently. Ontologies rides on the existing platform that Linked Data platforms presents in delivering this plethora of services. Linked Data is simply the linking of web contents so that a person can be able to explore or enjoy more of what they are already looking at. That is, once a user chooses to view some content, more content will be generated that is similar to the topic of study the client is currently consuming. Link is a relationship that is created between resources that refer to another via the use of uniform resource identifiers (URIs). While URIs applies for the case of hypertext documents primarily written in HTML, RDF (Resource Description Framework) describes links used in connecting similar data resources. A number of conditions have to be followed to achieve the linkage of data resources. The first condition is that, resources will be identified by the use of URIs. By using names instead of digits, one can relate similar resources easily. Secondly, web information should be served against a URI. Through such a system, classes and properties of information contained in OWL, RDFS, and RDF ontologies can be easily accessed. These are the main conditions that have to be met to realize a linked data platform among others. UML Database Diagram Design Fig. 1: UML Database Design Database Analysis The analysis of the SQL data involves the process of interlinking related data so that simpler models of access and organization are achieved. A number of variables will be analyzed for this process as listed below: pid start_time end_time epoch_start epoch_end complete_title media_type masterbrand service brand_pid is_clip categories tags The analysis of the SQL file involves the breakdown of the data into bits that can be understood and organized in the database system. The process of breaking down related data is called normalization. Normalization has different levels. These are: First Normalization Second Normalization Third Normalization First Normalization In the first normalization, the data will be linked as it generally appears on the UML design above. That is, tables are drawn and attributes are linked in a general way as to how they are related. For instance, since media types is the most commonly appearing entity, it will tend to appear as linked in the first normalized tables of this database as shown below: Media Type Brand Service Time Media Type Category Tags Epoch pid Masterbrand Is_clips Complete_title It is worthy to note that the relations are more focused on the main entities in the database. Second Normalization The second normal form of the database will be the creation of tables that further relate the attributes that are identified in the first normal analysis. In this case, the relations will normally enter into the attributes that a given entity has. For instance, this entity cab be time. Under time there are about four attributes: start_time, end_time, epoch_start and epoch_end. This is shown below: Media_type Media_type_id Category Time Start_time End_time Epoch_start Epoch_end Service Media_type_id Service_Name Service_Type Brand Masterbrand Brand_PID Brand_Name All these attributes, in spite of bearing exclusive features will be placed in the same table, as long as they are in the same entity. This type of normalization moves from the overall level to the entity level. Third Normalization In the third normalization of the database, the attributes themselves will now be defined. That is, tables that will focus on having a single unique feature for a given attribute will now be fabricated. An example is in the case of the time entity. It has both epoch and time attributes. When normalizing this form of data in the third stage these two unique features will be separated to form two different tables as in the figures below: Time Start_Time End_time Epoch Epoch_Start Epoch_End Service Service_Name Service_ID Brand Brand_PID Brand_Name Normalization is instrumental in achieving data that is easily accessible to all. Conclusion The data was ably analyzed through the normalization of its tables into bits that a user could easily relate and even send queries to. The data was r linked to a web-based platform where a PHP code enabled the generation of reports for any queries that a user needed. The entire experiment was a success. References BBC (n.d.),”Ontologies”, BBC Online. Available at: http://www.bbc.co.uk/ontologies (last accessed March 5, 2015). Read More

Database Design for Existing Dataset and Analysis - Business Plan Example

Extract of sample "Database Design for Existing Dataset and Analysis"

CHECK THESE SAMPLES OF Database Design for Existing Dataset and Analysis

Database Theory and Design

Low Cost-Inventory Control System

Principles of Database Design

Analysis of Customer and User Needs

Creative Clusters & Gentrification with focus on the Hoxton area in London

IC Insights - Dataset Analysis

Dataset for Salary

Improving Electronic Store Database Design