Database Management System Assignment Example | Topics and Well Written Essays

TABLE OF CONTENTS This thesis provides a benchmark to assess the performance of storing binary large objects (BLOB) in a relational database management system. A benchmark framework that reduces development and integration costs of benchmarking the database performance of a specific application is included. A randomly generated test script is utilized to provide variation among test cases and consistency between different runs of the benchmark. The benchmark is platform independent and can be extended to support any database vendor providing a Java data source for connectivity. The benchmark framework requires the implementation of one simulation class for each application benchmark. This significantly reduces the cost of benchmark development. Deciding which database system, operating system, or hardware configuration is best suited for the application is now feasible. Existing benchmarks do not provide any analysis of storing BLOBs within the database and are generally completed by the database or hardware vendor rather than the consumer. CHAPTER 1 INTRODUCTION With the passage of the Utah Digital Signature Act of 1995 [1] and the Federal Electronic Signature Act of 2000 [2], more and more paper documents are being stored electronically. These digitized documents have "electronic signatures" attached to them that serve not only as unique identifiers but make them legally binding and enforceable as well. One obvious medium that is used to quickly access and store these electronic documents is the database. Many online and distributed systems require the efficient access of thousands of binary files such as music (the sample files streamed to potential customers), digital signatures, and other documents and videos that generally range in size from 10KB to SMB. The storage and retrieval of these items can be accomplished through a database system using a column type capable of storing binary data (commonly referred to as a "binary large object" or a BLOB). There are many existing database systems capable of storing BLOBs. Such systems' functionality, performance and cost vary greatly among vendors (with some costing as much as $25,000 for a single CPU license). The prohibitive cost often keeps some organizations from conducting full assessments of database systems for possible use in their operations, potentially causing a loss or revenue. Entities with limited resources need an affordable means of assessing or benchmarking their own database utilization. This thesis, first and foremost, provides a database benchmark for storing BLOBs in a database. Secondly, a database benchmark framework is provided with the example implementation being the BLOB database benchmark. By so doing: Developers will finally have a database benchmark for the storage of small, medium and large BLOBs in a database. Users will have a reliable and affordable means to benchmark their applications' database utilization on multiple database systems. Users will be able to determine the performance of different accessible system configurations (hardware, database and operating systems). Users will be able to get results reflecting the configuration capability of their organization. Running the benchmark locally will allow reliable replication of the benchmark without reading large disclosure documents provided with results from commercial benchmarks. The software development community will now have a benchmark for storing BLOBs in a database. This benchmark will fill a void as most database benchmarks focus on OLTP (online transaction processing) that consist of very small records and none focus on the general storage of BLOBs. CHAPTER 2 REVIEW OF LITERATURE-EXISTING DATABASE BENCHMARKS My research indicates that no research has been done comparing the performance of database systems' ability to store BLOBs. Similar research regarding database performance (for datasets not including binary data) does exist and come from the following sources: the Transaction Processing Counsel (TPC) [3], Storage Performance Council (SPC) [4], Open Source Database Benchmark (OSDB) [5], Engineering Database Benchmark (EDB) [6], and Wisconsin Database (Bitton) [6]. In addition to not considering BLOBs, the TPC and SPC benchmarks are implemented by vendors who perform many specialized database and operating system configurations. Thus, they are not pertinent to this discussion. These nonstandard configurations take advantage of the vendors' specialized knowledge of the hardware, database, operating system and benchmark. This thesis provides the first benchmark to assess the storage of BLOBs in relational database systems. The performance results are categorized into three general BLOB sizes: small, medium and large (see Chapter 7 for a detailed definition of the BLOB datasets). Below is a brief description of the five similar database performance resources. 2.1 Transaction Processing Council (TPC) Each TPC benchmark consists of a set of functional requirements to be run on any transaction processing system independent of the hardware or operating system. "It is then up to the test sponsor to submit proof (in the form of a full disclosure report) that they have met all the requirements" [7]. The TPC has four non-obsolete database benchmarks: TPC-C, TPC-H, TPC-R and TPC-W. The four current TPC benchmarks define the requirements of database utilization, scaling, transaction types and how the business activities should be modeled. Two shortcomings of the TPC benchmarks are these. First, vendors implement and perform the benchmarks, and second, users cannot assess the benchmark's performance on their hardware or modify the benchmark to closely resemble their applications. In order to realize similar performance results and due to these shortcomings, developers must utilize the exact hardware and configuration used to implement the benchmark and then model their application after the benchmark definition rather than their own needs. While the TPC benchmarks have value, none of them deal with the storage of BLOBs in the database or provide a standard code base from which users can extend, enhance or even run the benchmarks themselves. Currently, the TPC provides four of these benchmarks; a general description of each follows: A. TPC-C is an OLTP benchmark for a wholesale parts supplier and is the latest revision of the original TPC-A benchmark that was the first benchmark implemented by the TPC and was described as an OLTP Debit-Credit benchmark [7]. B. TPC-H is a decision support benchmark. "This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions" [8]. C. The TPC-R benchmark "is a decision support benchmark similar to TPC-H, but which allows additional optimizations based on advanced knowledge of the queries." [9]. D. TPC-W benchmarks typical activities for a business-oriented web server for a transaction E-commerce application [10]. 2.2 Storage Performance Council (SPC) The SPC has two benchmarks (SPC-1 and SPC-2) for analyzing the performance of storage subsystems. These benchmarks are very general and not specific to database performance. "SPC-1 represents a segment of applications characterized by predominately random I/O operations and requiring both queries as well as update operations (for example OLTP systems, database systems, or mail server applications)" [4:13]. Like the TPC benchmarks, both SPC-1 and SPC-2 are generated by vendors trying to show excellent performance for their hardware or software. 2.3 Open Source Database Benchmark (OSDB) OSDB is based on the ANSI SQL Scalable and Portable Benchmark (AS3AP), documented in The Benchmark Handbook [6]. The AS3AP specification defines a set of single-and multi-user tests that are then run against a database and scaled complete in 12 hours. The single-user tests provide a benchmark of the basic functions supported by a relational database that must support to comply with the ANSI SQL 2 standard. The multi-user tests benchmark OLTP, information retrieval, and "mixed workloads including a balance of short transactions, report queries, relation scan, and long transactions" [6]. As with the other benchmarks, the OSDB does not provide any analysis of storing BLOB fields in the database. However, as it is an open source, users can run this benchmark and alter it. 2.4 Engineering Database Benchmark (EDB) The EDB was developed specifically for benchmarking typical database performance of engineering products such as computer-aided design (CAD) applications and is also documented in The Benchmark Handbook [6]. The focus on engineering database utilization has limited its scope to the domain of engineering applications unlike the BLOB benchmark which is not limited to one particular domain. 2.5 Wisconsin Database Benchmark [6] This benchmark was developed in 1983 and is no longer relevant due to the exponential improvements in hardware, software, and database related capabilities. CHAPTER 3 BLOB BENCHMARK ADVANTAGES Compared with the five benchmarks discussed in Chapter 2, the BLOB benchmark and framework provides the following advantages for the typical small development group: First, the BLOB benchmark provides a simple interface (shown in Figure 3.1) that lets users, rather than hardware or software vendors, run the benchmark. This allows users the ability to assess multiple database systems on a hardware and operating system configuration that is familiar and accessible to the user. That is, users can run the BLOB benchmark on their own computer systems and determine which database, hardware, or operating system configuration is optimal for their needs. Figure 3.1. User interface used to configure and start the BLOB benchmark Second, users configure the hardware, software and network which force the benchmark to reflect the end user's technical ability to configure the system. When the users configure the benchmark, they will be able to evaluate and understand the effects of the configuration changes they make. This helps users avoid the pitfalls associated with some configuration changes made by vendors to speed up a particular benchmark. Next the benchmark can run on any operating system with a compatible Java Virtual Machine including UNIX, Windows, Linux, and Macintosh. Thus, the benchmark provides flexibility in hardware, operating system, and database choices. Another advantage is that the BLOB benchmark provides database connection pooling, individual test case management and statistics reporting thus reducing benchmark development costs as developers only need to simulate or utilize their own applications. Further, the BLOB benchmark utilizes fundamental object-oriented programming techniques to provide and define a SqlProvider, TestScriptGenerator, TestCase, and DBRecords (see Section 4.1). The SqlProvider creates and executes the SQL statements required for their specific application and can easily be extended to extend or modify the benchmark behavior. The TestScriptGenerator creates one test script that is used to benchmark each configuration allowing random user activity (because typical application usage generally varies) but precise consistency among each execution of the benchmark. The TestCase simulates the business logic but all database logic is in the SqlProvider and DbRecords. Like the SqlProvider, the TestCase can be extended and modified. DbRecords are objects that represent a database record; these objects encapsulate how the SqlProvider is utilized to persist each record to the database. The BLOB benchmark provides users the ability to repeatedly analyze changes in hardware, operating systems, file-systems, SQL statements and database schema for potential bottlenecks. This allows the users the ability to fully optimize their application. Finally, the BLOB Benchmark allows users to make database vendor decisions by running their own application and determining its needs. This feature eliminates the guesswork involved in determining whether a particular standard benchmark applies to the user's application. Users now have the ability to gather real data to use when making strategic business decisions. CHAPTER 4 BENCHMARK FRAMEWORK This chapter will discuss the necessary features of a database benchmark framework for the purposes of this research. The database benchmark Framework will leverage fundamental aspects of object-oriented programming (OOP) to create a framework to quickly benchmark specific application functionality rather than theoretical functionality. This framework will be built in Java to increase the pool of testable operating systems and database systems. Sections 4.1 through 4.4 describes four objects (two abstract classes and two interfaces) that are designed to be extended by the end user when implementing a new database benchmark. Figure 4.1 shows the usage relationship of the user-defined objects. Each object is described later in the chapter. An object-oriented language was chosen to provide users with an industry standard method to extend and enhance the benchmark framework. Of the three most commonly utilized object-oriented languages (Java, C++ and C#), Java was chosen for three reasons. First, database connectivity can be encapsulated entirely within one object for each database vendor, a significant improvement over C++. Second, the Java language provides cross-platform support where C# does not, and C++ requires careful effort. Third, the author of this thesis has extensive development experience in Java with database utilization. Figure 4.1. The UML diagram for the classes used to create a new benchmark. 4.1 SqlProvider The SqlProvider is an abstract class that defines three database-related cleanup methods and a binary stream reader. The database cleanup methods are for the Java ResultSet, PreparedStatement, and Connection classes. This class is designed to be extended to define all database access methods for the application. Understanding OOP is the key to properly utilizing the SqlProvider to handle switching between database systems at runtime and, thereby, reducing costs. There are four steps to implementing the SqlProvider: Step 1: Determine whether the application requires database specific SQL statements or transaction management, such as commit and rollback. Step 2: Extend the SqlProvider to develop a base interface for all database methods utilized by the application. If any methods are known to be standard for all database systems, they should be defined here to maximize the code reuse. Step 3: Implement a version of the SqlProvider defined in Step 2 for each database system requiring non-standard SQL or transaction management. Step 4: Create configuration files referencing each fully implemented variation of the SqlProvider from Step 2. The following example class shown in Figure 4.2 illustrates the proper use of inheritance and a skeleton implementation of the SQL provider using the four steps described above. Figure 4.2. An example base class implementation for the SqlProvider. Overriding the BaseSqlProvider with any database specific providers is accomplished similar to the format shown in Figure 4.3. Figure 4.3. Handling database specific functionality in the SqlProvider. The BaseSqlProvider provides the contract that each method must implement or inherit to be a BaseSqlProvider. A TestCase (or DbRecord if the user chose to encapsulate each records usage of the SqlProvider within DbRecord objects) now only need concern itself with using a BaseSqlProvider. The benchmark framework instantiates the correct database provider based on the configuration parameters specified by the user without having to recompile any code. 4.2 TestScriptGenerator The TestScriptGenerator creates the test script that simulates random changes in user activity at script creation time. This provides consistency in each subsequent run of the benchmark and allows the changes of system configuration (database, operating system and hardware) to be analyzed. The random variations will simulate a variety of utilization schemes without allowing stochastic errors. 4.3 TestCase Users implement the TestCase interface to represent each thread in the benchmark. The test script provides the decision making logic for the TestCase about which actions to be performed and the order in which to perform them. 4.4 DBRecord Users can implement the DBRecord class for each table in the database to provide an object-oriented representation of the record. CHAPTER 5 BLOB BENCHMARK This study will benchmark MySql 4.1, Oracle l0g, IBM. DB2 8.1, and MS SQL Server 2000 (MSSQL). The BLOB benchmark provides a simple database schema (see Appendices A through D) that handles the fundamental aspects of a document management product that retains version information. The schema represents a file-system with version history for each file. The database stores all document revisions unless the document is permanently deleted, which will cause all prior revisions to be deleted. Each revision will contain a description, userid and a revision number. Using the BLOB benchmark allows users to: Create their own binary BLOB dataset (i.e. any directory structure users want to use in the test) that more precisely reflects their anticipated storage needs; Run the TestScriptGenerator against the user's dataset to create a personalized test script; and Run the benchmark in the user's office utilizing their files, hardware, and database configuration. As a result users will: Have a better understanding of the BLOB benchmark than benchmarks implemented, configured and performed by vendors; Fully understand the factors affecting the benchmark's results; and Be able to see how well each database performs with their data, database configuration and hardware. 5.1 SQL Provider The base SqlProvider implemented for the BLOB enchmark handles all the database activities except one activity for MSSQL. MSSQL is the only database that requires specific logic. The one specific change for MSSQL is that the update of a BLOB type field and a clustering index must be completed as two database updates within one transaction rather than a single update. 5.2 Test Script Algorithm The initial benchmark information reflects the storage and retrieval of binary files within three size categories: small, medium, and large (see Tables 7.1 through 7.3). Each machine (one or more) contains the same binary files. The Test Script creates the specified number of thread variations of the BLOB benchmark for each iteration. Each thread performs its specified TestCase on each file in the set defined by the following: (fileSetCount mod threadNumber) * iterationNumber < fileSetCount. For example, numThreads = 30, fileSetCount = 5000, Thread 1 file index set = {1,31,61,... 4921, 4951,4981} Each thread the test script spawns performs the following operations on the file set. Section 5.3 explains how a test script created by this algorithm is used, and it provides an example. Step 1: First the thread inserts a random number (between 5 and 15) of files from the free list into the database, removes their indices and adds them to the inserted list (in the order selected). Note: The number of files selected for insertion can actually fall below 10 only if less than 10 files remain in the insert list (this may happen if the number of iterations is very high). Step 2: Next, the thread picks a random number (between 5 and 15) of files from the inserted list and adds them to the activity list. Activities 13, 14, and 15 are delete activities that remove the BLOB from the database and add it back into the insertion list. Step 3: Finally, the thread repeats steps 1 and 2 until the total desired iterations have been generated. 5.3 TestCase The thread performs the following five actions for each file in the activity list. The activities happen regardless of whether the file is added to the deletion list (before it is deleted). A. Increases the size of the file by five percent and updates. The bytes used to extend the file are from the first part of the file. B. Decreases the size of the file by ten percent and updates. C. Increases the size of the file by fifteen percent and updates. The bytes used to extend the file are from the first part of the file. D. Decreases the size of the file by five percent and updates. E. Swaps the first ten percent of the file with the last ten percent. The percentages were chosen to increase the overall size of the BLOBs stored in the database. The percentages are five, ten and fifteen because they represent a significant change in the size or makeup of the BLOB. An example test script is shown on the next page in Figure 5.1. Figure 5.1. Example test script. The test script fragment represents the ninth iteration of the BLOB benchmark that contains two concurrent threads each running a test case. A breakdown of the actions that would be performed by each thread is given in A and B below. A. Thread 0 a. Insert files 1023,3745,7359,5383,115,3950, and 2507. b. Get file 708, increase its size by five percent, and update its corresponding database records. c. Get file 4320, decrease its size by ten percent, and update its corresponding database records. d. Get file 1199, increase its size by fifteen percent, and update its corresponding database records. e. Get file 3404, decrease its size by five percent, and update its corresponding database records. f. Get file 6014, swap the first ten percent with the last ten percent of the bytes, and update its corresponding database records. Activities 6 through 13 repeat the pattern defined in b through fusing modulo 5 arithmetic. B. Thread 1 Thread 1 follows the same pattern as Thread 0 to determine what activities happen on which files. 5.4 DBRecord The DBRecord interface was implemented for each database table. This allows the encapsulation of the database logic to the objects representing database records. The framework does not require this; rather, it is one method of encapsulating a database record in an object for more efficient utilization. 5.5 Database Schema There are four database schemas: MSSQL, DB2, Oracle, and MySql. The schemas only differ in the syntax and the types of fields available with each database. Thus, a discussion of the actual schema is not warranted. The actual schemas are shown in Appendices A for MSSQL, B for DB2, C for Oracle, and D for MySql. The BLOB benchmark's database schema creation pattern is the same for each database, as follows. The blob_user table contains the user object and the BLOB benchmark creates one user for each TestCase. The doc_info table contains the static information that is the same across each revision of the BLOB (such as the filename, path and who has the BLOB's lock) and the number of the latest revision. In addition to the primary key, two indexes are created for the doc_info table: filename and pathname, id and locked_by. These two indexes exist to reduce table scans because the field combinations are utilized in SQL where clauses. The final table in the BLOB benchmark schema is the document table. The document table stores the BLOB and all information specific to each particular revision of the BLOB. The revision specific information includes a foreign key referencing the blobjiser id table and doc_info id; current revision number, description and a timestamp. In addition to the primary key, indexes are created for the foreign keys doc_info_id and user_id. CHAPTER 6 NETWORK, SERVERS AND SOFTWARE CONFIGURATIONS The BLOB benchmark will be performed with a simple configuration consisting of one client and one database server. The client will run the BLOB benchmark with the database server only containing the database being tested. 6.1 Network The network configuration for this benchmark is illustrated in Figure 6.1. Client and database servers use a 10/100 MB Cat 5e connection to the Netgear switch. A Netgear FS116 unmanaged switch was chosen because it was available and in good working condition. It represents the unmanaged switching equipment generally used by severely cash strapped organizations. Furthermore, because these two servers are the only devices on the switch, the benefits provided by higher end managed switches will not be realized. Figure 6.1. The physical network layout. 6.2 Servers The servers used in the benchmark are two Dell rackmount servers. These servers were chosen because they represent entry to mid-level server hardware and were available for use in this thesis. The database server is a Dell PowerEdge 1750 Rack-mount Server with a 2.8 GHz Intel Xeon processor with 512kb cache. It contains 2 GB ECC RAM and has a 36 GB 10K RPM SCSI HD. It has the following advantages: a high speed, server grade CPU; enough RAM; and a high-speed hard disk. The primary disadvantage would be that the database disk load is not separated onto multiple hard disks. This generally requires special database or raid configurations, but analysis of such configurations is out of the scope of this thesis. The benchmark client is a Dell PowerEdge 1750 Rack-mount Server with Dual 2.8 GHz Intel Xeon processors each containing 512kb cache. It utilizes 2 GB ECC RAM and has a 36 GB 10K RPM SCSI HD. The same advantages apply to the client server as the database server except that the client uses much more CPU and is multithreaded so it has a second CPU. 6.3 Software The software configuration consists of the following. 6.3.1 Database Server The Windows platform was chosen because it allows the user to see resource utilization to verify utilization trends among each test and because of the relative ease of installation for the database software. The database server software is listed below. Windows 2000 Server MySql4.1.7 Oracle l0g Microsoft SQL Server 2000 - IBM DB2 8.1 Enterprise Edition 6.3.1 Client Server The Linux platform was chosen to show cross platform utilization and because it was needed on this server once the tests were completed. The client server listed below. Linux - Fedora Core 2 Java Runtime Environment 1.4.2 05-b04 CHAPTER 7 BLOB DATASETS In order to represent several BLOB usage paradigms, three major distributions in the average size of BLOB datasets were used: small, medium and large. For the purposes of this research, small, medium and large datasets are defined in Tables 7.1, 7.2, and 7.3, respectively. The different sized datasets provide significant performance variations (see Chapter 10). Using the same algorithm, one test script is created for each BLOB dataset. Table 7.1 shows a statistical analysis of the small BLOB dataset. The small BLOB dataset is designed to show the database performance when storing and retrieving primarily small files. The total dataset consists of 8143 files of which 7683 are less than 25 KB. For the Small BLOB Dataset, 1500 users were simulated. Table 7.1. The Statistics for the Small BLOB Dataset Total iterations 100 Simultaneous threads per iteration 15 Total simulated users 1500 Total BLOBs inserted 8143 Total inserted bytes 253.05 MB Average inserted BLOBs size 31.82KB Total BLOBs activities 12382 Total updated bytes 314.44MB Average activity BLOBs size 26.00KB Total BLOBs deleted 408 Minimum inserted BLOB size 4 bytes Maximum inserted BLOB size 46.44MB Minimum activity BLOB size 21 bytes Maximum activity BLOB size 53.41 MB Initial BLOB size distribution Less than 25 KB 7683 25-100 KB 276 100-1024 KB 160 1-5 MB 20 Greater than 5 MB 4 The medium BLOB dataset (shown in Table 7.2) focuses on analyzing the database performance of BLOBs primarily ranging between 25 KB and 1024 KB. The total disk space utilized on the database increases significantly as the average BLOB size increases. Therefore, the total number of BLOBs and TestCase iterations is reduced for this category. Table 7.3 shows a statistical analysis of the large BLOB dataset. The large BLOB dataset provides only large BLOBs ranging from 566.23 KB to 51.09 MB. As with the Medium BLOB Dataset, the increased average BLOB size restricts the total number of Table 7.2. The Statistics for the Medium BLOB Dataset Total iterations 50 Simultaneous threads per iteration 10 Total simulated users 500 Total BLOBs inserted 2395 Total inserted bytes 690.87 MB Average inserted BLOBs size 295.39KB Total BLOBs activities 3748 Total updated bytes 948.14MB Average activity BLOBs size 259.04KB Total BLOBs deleted 143 Minimum inserted BLOB size 8.00KB Maximum inserted BLOB size 29.03 MB Minimum activity BLOB size 8.40KB Maximum activity BLOB size 25.88 MB Initial BLOB size distribution Less than 25 KB 55 25-100 KB 1154 100-1024 KB 1128 1-5 MB 33 Greater than 5 MB 25 BLOBs and TestCase iterations. All four databases nearly utilized most of the available disk space on the test hardware with this dataset. Table 7.3. The Statistics for the Large BLOB Dataset Total iterations 25 Simultaneous threads per iteration 5 Total simulated users 125 Total BLOBs inserted 244 Total inserted bytes 2.66 GB Average inserted BLOBs size 11.17 MB Total BLOBs activities 376 Total updated bytes 4.00 GB Average activity BLOBs size 10.90 MB Total BLOBs deleted 7 Minimum inserted BLOB size 627.19 KB Maximum inserted BLOB size 51.09 MB Minimum activity BLOB size 566.23 KB Maximum activity BLOB size 45.98 MB Initial BLOB size distribution Less than 25 KB 0 25-100 KB 0 100-1024 KB 39 1-5 MB 57 Greater than 5 MB 148 CHAPTER 8 BENCHMARK VARIATIONS Two variations of the BLOB benchmark utilize the database in significantly different ways. They represent two different usage paradigms of BLOBs. The first BLOB benchmark variation uses additional inserts to update the BLOBs to maintain revision histories and hereafter will be referred to as the BLOB benchmark with revision history. It simulates a source code control application wherein a document or image's previous version needs to remain accessible. Each thread creates one user and executes all the inserts and activities defined in the test script (for that thread). The second variation of the BLOB benchmark, henceforth referred to as the BLOB benchmark without revision history, performs an initial insert of a BLOB and updates the BLOB field of the initial record for that BLOB. In doing this, each BLOB is inserted once and then overwritten with each subsequent revision. Updates utilize different functionality of the database system and affect the underlying file system differently. The activities are where the benchmarks differ. For the BLOB benchmark with revision history, each activity inserts a modified copy of the BLOB into the database. This creates a larger number of records in the document table. This increases the overall disk space required to store the database which, in turn, creates issues for some database systems. This variation simulates an application using a database to store BLOBs and maintain the revision history for these applications. Since maintaining the revision history of the BLOB never utilizes the update functionality, only the insertion and deletion of BLOBS into the database is benchmarked. In contrast, the BLOB benchmark without revision history uses the same BLOB modifications to update the initial database record when performing activities. For some database systems, this will use less disk space. Between the two BLOB benchmark variations, users will be able to understand the relative performance differences between updating and inserting BLOBs to a database system. This research will examine both variations to ascertain the relative performance of four major database providers and provide useful data for comparing the insertion and update capabilities for each database with small, medium and large BLOBs. Tables 8.1, 8.2, and 8.3 show the general structure of the database utilized by the BLOB benchmark. Refer to section Appendices A through D for more specific details about the tables and the schema for each database system. The two benchmark variations will utilize the BLOBJJSER and the DOCJNFO tables exactly the same. The usage of the DOCUMENT table will be the same when initially inserting the BLOB into the database and differ when updating and deleting the BLOB as described in the preceding paragraph. Table 8.1. The Database Table Used to Store Each User. BLOB USER Name General Type Length Nullable ID Character 80 No PASSWORD Character 80 No FIRST_NAME Character 80 No LAST_NAME Character 80 No Primary Key ID Table 8.2. The Database Table Used to Store the Common Information Regarding the BLOB. DOC INFO Name General Type Length Nullable ID Integer No FILENAME Character 8 No PATHNAME Character 255 No CURRENT_REVISION Integer No LOCKED BY Character 80 Yes Primary Key ID Foreign Key LOCKED_BY references BLOB_USER.ID Index {FILENAME, PATHNAME) Index {ID, LOCKED_BY} Table 8.3. The Database Table Used to Store the Actual BLOB Fields. DOCUMENT Name General Type Length Nullable DOC INFO ID Integer No REVISION_NUMBER Integer No DESCRIPTION Character 512 Yes USER_ID Character 80 No REVISION_DATE Datetime No BYTES BLOB No Primary Key {DOC_INFO_ID, REVISION NUMBER} Foreign Key DOC INFO ID references DOC_INFO.ID ON DELETE CASCADE Foreign Key USER_ID references BLOB_USER.ID Index DOC_INFO_ID CHAPTER 9 DATABASE CONFIGURATIONS AND MODIFICATIONS Each database was originally tested with its default installation parameters. Only necessary modifications to the default configuration of the database systems were performed. MySql is the only database that immediately reported errors. However, DB2 did run out of disk space running the BLOB benchmark with Revision History using the large dataset (see Chapter 10). The database configurations are: 9.1 Oracle l0g Default The Oracle l0g configuration is a standard basic installation with no manual configuration. 9.2 MySql 4.1.7 Default The MySql configuration is the default unzip install with no manual configuration. Unfortunately, this configuration did not work with the BLOB database benchmark as the default maximum allowed packet size is 1 MB. The 1MB maximum packet size caused the benchmark to error when inserting or updating BLOBs larger than 1 MB. Because of this error, MySql required the custom configuration defined as MySql 4.1.7 Packet Size Configuration. 9.3 MySql 4.1.7 Packet Size Configuration The MySql Packet Size Configuration consists of the default unzip install with the parameter--max_allowed_packet=512m set on database startup. This configuration successfully completed each of the tests in the BLOB database benchmark. However, MySql 4.1 repeatedly reported the following error 'InnoDB: ERROR: the age of the last checkpoint is 9447127, which exceeds the log group capacity 9433498. Using big BLOB or TEXT rows requires the combined size of log files to be set at least 10 times bigger than the largest such row' with the checkpoint size varying for each error. To resolve this issue, the MySql 4.1.7 Log Buffer Configuration was used. 9.4 MySql 4.1.7 Log Buffer Configuration The MySql Log Buffer Configuration uses the default unzip install with the following parameters: --max_allowed_packet=512m, --innodb_log_buffer_size=512m, --innodb_log_file_size=512m, and --innodb_log_files_in_group=4 set at database startup. Once the maximum allowed packet size, log buffer size, log file size and total logs files were configured, MySql was able to complete all three benchmarks without errors in the benchmark or from the database software. 9.5 Microsoft SQL Server 2000 Microsoft SQL Server 2000 is a standard installation with no manual configuration. 9.6 IBM DB2 8.1 Enterprise Edition The IBM DB2 8.1 Enterprise Edition standard installation was used with no manual configuration. CHAPTER 10 BENCHMARK RESULTS The following charts and graphs represent the average and total time spent waiting for the database system. The total time was gathered by calculating the elapsed time to store (insert or update), delete and retrieve the records from the database for each TestCase. The total time represents only the time spent performing database-related activities that include acquiring the database connection, creating the query, running the query, reading the query results when applicable, and committing the query. The average time is determined by the total time divided by the number of TestCases. Creating the user record is the only database query that is not directly factored into the BLOB benchmark results. The time spent waiting for this query was deliberately not included because it does not provide any useful information regarding the storage of BLOBs. One user record is inserted into the database for each thread at the beginning of the iteration. This record contains no binary data and represents activities that happen once for each user. The user record represents other activities performed on the database during periods of heavy usage and facilitates the second foreign key constraint in the record containing the BLOB field. 10.1 BLOB Benchmark with Revision History The BLOB benchmark with revision history shows significant variations in the capability of each database system to store BLOBs, see Tables 10.1 through 10.3 and Figures 10.1 through 10.3 for actual results. For the small dataset (see Figure 10.1), MSSQL performs the best taking 49 percent less time than Oracle, the nearest competitor. MySql has the worst performance with the small BLOB dataset taking 320 percent longer than DB2. For the medium dataset, MSSQL again outperforms MySql, Oracle and DB2 respectively by 48, 49, and 81 percent less time as demonstrated in Figure 10.2. With the large BLOB dataset, Figure 10.3 shows that MySql has the best performance at 4 percent faster than MSSQL, 66 percent faster than Oracle and 75 percent faster than DB2. Table 10.1. Results of the BLOB Benchmark with Revision History for the Small BLOB Dataset. Database Configuration Average Duration (Seconds) Total Duration (Seconds) Oracle 10g 3.07 4608 MySql 4.1 Packet Config 29.35 44030 MySql 4.1 Packet and Log Config 28.91 43359 MS SQL Server 2000 1.56 2346 DB2 8.1 Enterprise Edition 9.59 14388 Table 10.2. Results of the BLOB Benchmark with Revision History for the Medium BLOB Dataset. Database Configuration Average Duration (Seconds) Total Duration (Seconds) Oracle 10g 23.34 11672 MySql 4.1 Packet Config 22.58 11288 MySql 4.1 Packet and Log Config 22.71 11354 MS SQL Server 2000 11.91 5953 DB2 8.1 Enterprise Edition 63.00 31499 Table 10.3. Results of the BLOB Benchmark with Revision History for the Large BLOB Dataset. Database Configuration Average Duration (Seconds) Total Duration (Seconds) Oracle 10g 737.94 92242 MySql 4.1 Packet Config 280.82 35102 MySql 4.1 Packet and Log Config 249.58 31197 MS SQL Server 2000 260.26 32532 DB2 8.1 Enterprise Edition 1012.83 126604 Figure 10.1. Results of the BLOB benchmark with revision history for the small BLOB dataset. Figure 10.2. Results of the BLOB benchmark with revision history for the medium BLOB dataset. Figure 10.3. Results of the BLOB benchmark with revision history for the large BLOB dataset. 10.2 BLOB Benchmark without Revision History The BLOB benchmark without revision history has the database perform an update of the BLOB field for every activity, refer to Tables 10.4 through 10.6 and Figures 10.4 through 10.6 for actual results. This requires a significant modification for MSSQL as it does not allow the update of a clustering index at the same time as a BLOB field. To address this issue, the BlobSqlProvider class is extended (only for MSSQL server) such that the non-binary fields are updated separately from the BLOB field and then the transaction is committed. Other database systems do not require this modification since they can update the BLOB fields and indexes hi the same update. Performance rankings for the small BLOB dataset without revision history, shown in Figure 10.4, remained the same as with revision history; that is, MSSQL first, Oracle second, DB2 third and MySql fourth. The medium BLOB dataset has MSSQL outperforming MySql, DB2, and Oracle, respectively, taking 10, 12, and 74 percent less time as depicted in Figure 10.5. Figure 10.6 shows the large BLOB dataset having DB2 finish 32 percent earlier than the nearest competitor (MSSQL) and 81 percent faster than the slowest performer (Oracle). Table 10.4. Results of the BLOB Benchmark without Revision History for the Small BLOB Dataset. Database Configuration Average Duration (Seconds) Total Duration (Seconds) Oracle 10g 4.47 6698 MySql 4.1 Packet Config 29.21 43816 MySql 4.1 Packet and Log Config 29.61 44414 MS SQL Server 2000 1.94 2903 DB2 8.1 Enterprise Edition 6.98 10469 Table 10.5. Results of the BLOB Benchmark without Revision History for the Medium BLOB Dataset. Database Configuration Average Duration (Seconds) Total Duration (Seconds) Oracle 10g 104.24 52118 MySql 4.1 Packet Config 27.42 13709 MySql 4.1 Packet and Log Config 26.30 13149 MS SQL Server 2000 23.75 11874 DB2 8.1 Enterprise Edition 26.97 13484 Table 10.6. Results of the BLOB Benchmark without Revision History for the Large BLOB Dataset. Database Configuration Average Duration (Seconds) Total Duration (Seconds) Oracle 10g 1867.84 233480 MySql 4.1 Packet Config 561.35 70169 MySql 4.1 Packet and Log Config 541.99 67749 MS SQL Server 2000 505.31 63164 DB2 8.1 Enterprise Edition 345.00 43125 Figure 10.4. Results of the BLOB benchmark without revision history for the small BLOB dataset. Figure 10.5. Results of the BLOB benchmark without revision history for the medium BLOB dataset. Figure 10.6. Results of the BLOB benchmark without revision history for the large BLOB dataset. 10.3 BLOB Benchmark Comparison This section compares the BLOB benchmark with revision history (BLOB inserts only) to the BLOB benchmark without revision history (initial BLOB insert and subsequent updates). As shown in Figure 10.7 (the small BLOB dataset), Oracle and MSSQL increase their waiting time by 45 and 25 percent, respectively, when performing updates instead of inserts. DB2 decreases its waiting time by 27 percent. MySql's performance remained nearly unchanged. The medium BLOB dataset described in Figure 10.8 shows DB2 again increasing its performance with the waiting time reducing by 57 percent. Oracle, MySql and MSSQL all increased their waiting time by 340 percent for Oracle, 16 percent for MySql and 99 percent for MSSQL. The large BLOB dataset also followed the same trend as the medium BLOB dataset with DB2 reducing its wait time by nearly 66 percent, Oracle increasing by 153 percent, MySql increasing by 117 percent and MSSQL increasing by 94 percent as illustrated in Figure 10.9. Figure 10.7. Comparative results of the two BLOB benchmarks using the small BLOB dataset. Figure 10.8. Comparative results of the two BLOB benchmarks using the medium BLOB Figure 10.9. Comparative results of the two BLOB benchmarks using the large BLOB dataset. CHAPTER 11 CONCLUSIONS As with any benchmark, the BLOB benchmark cannot be applied to all applications using the tested database systems. Factors such as the database schema, utilization patterns, network conditions, and hardware will affect the performance of each database for many applications. The BLOB benchmark is designed to specifically test the ability to store binary data in a database. Applications using a database to store BLOBs can be benchmarked with the BLOB benchmark with only the modification of the TestCase and SqlProviders. The BLOB benchmark provides general information regarding how well these systems perform when storing BLOBs. The BLOB benchmark shows significant performance differences between the major database systems' ability to insert and update BLOBs. The BLOB benchmark shows significant trends for several database systems. For example, Oracle's performance becomes significantly worse as the size of the BLOB dataset is increased. MSSQL's performance is the best or a close second for all sized BLOBs, whether inserting or updating the BLOBs. MySql's performance switches from last with small BLOBs to a tie for best performance when inserting large BLOBs. The performance of Oracle and DB2 varies greatly between inserting and updating BLOB fields while MySql and MSSQL are less affected. DB2 has significantly better performance when updating BLOB data when compared to inserting BLOB data. Inversely, Oracle's performance is greatly reduced when updating BLOBs instead of inserting them. DB2 updates large BLOBs the fastest and inserts them the slowest. DB2 is the only database system that performs BLOB field updates faster than BLOB field inserts regardless of the BLOB size. Compared with the other tested database systems, MSSQL have the most consistent performance performs relatively and MySql has the poorest performance with small BLOB dataset. Since the database utilization of every application is likely to vary rather significantly, this thesis presents a database benchmark framework and a binary large object benchmark developed on this framework. The benchmark framework greatly reduces the overall development time required to create a performance test or benchmark. If the application is already implemented, users can simply create a TestCase that utilizes their application or implement a TestCase, SqlProvider (recommended for comparing database systems) and TestScriptGenerator (recommended for randomly simulating user activities). The BLOB benchmark serves as one database benchmark and an example of how to reuse the benchmark framework to understand the changes to the data and the application logic which affects the database performance. Small development shops will be able to use the benchmark framework to significantly reduce the cost of benchmarking their application's database utilization, spend more time enhancing the performance and determine which database system provides the best performance. REFERENCES [1] Utah State Law Title 46 Chapter 3. Digital Signature Act. May 1995. [2] U.S. Public Law 106-229. Electronic Signatures in Global and National Commerce Act. July 2000. [3] Transaction Processing Performance Council. 2004. September 2004. [4] Storage Performance Council. (SPC-1) Official Specification, Revision 1.8.0: SPC Benchmark-1. January 10, 2004. [5] Open Source Database Benchmark. 2004. September 2004. [6] Benchmark Handbook for Database and Transaction Processing Systems. Ed. J. Gray. 2004. September 2004. [7] Transaction Processing Performance Council. TPC Benchmark C. 2004. September 2004. [8] Transaction Processing Performance Council. TPC Benchmark H. 2004. September 2004. [9] Transaction Processing Performance Council. TPC Benchmark R. 2004. September 2004. [10] Transaction Processing Performance Council. TPC Benchmark W. 2004. September 2004. APPENDICES Appendix A CREATE TABLE BLOBJJSER ( ID VARCHAR(80) NOT NULL, PASSWORD VARCHAR(80) NOT NULL, FIRST_NAME VARCHAR(80) NOT NULL, LAST_NAME VARCHAR(80) NOT NULL, PRIMARY KEY(ID) ); CREATE TABLE DOC_INFO ( ID INT NOT NULL, FILENAME VARCHAR(80) NOT NULL, PATHNAME VARCHAR(255) NOT NULL, CURRENT_REVISION INT NOT NULL, LOCKED_BY VARCHAR(80), PRIMARY KEY(ID), FOREIGN KEY(LOCKED_BY) REFERENCES BLOB_USER ); CREATE INDEX DOC_INFO_FILE_PATH_IDX ON DOC_INFO(FILENAME, PATHNAME); CREATE INDEX DOC_INFO_ID_LOCK_IDX ON DOC_INFO(ID, LOCKED_BY); CREATE TABLE DOCUMENT ( DOC_INFO_ID INT NOT NULL, REVISION_NUMBER INT NOT NULL, DESCRIPTION VARCHAR(512), USER_ID VARCHAR(80) NOT NULL, REVISION_DATE DATETME NOT NULL, BYTES IMAGE NOT NULL, PRIMARY KEY(DOC_INFO_ID, REVISION_NUMBER), FOREIGN KEY(DOC_INFO_ID) REFERENCES DOC_INFO(ID) ON DELETE CASCADE, FOREIGN KEY(USER_ID) REFERENCES BLOB_USER(ID) ); CREATE INDEX DOCUMENT_DOC_INFO_ID_IDX ON DOCUMENT(DOC_INFO_ID); Appendix B CREATE TABLE BLOB_USER ( ID VARCHAR(80) NOT NULL, PASSWORD VARCHAR(80) NOT NULL, FIRST_NAME VARCHAR(80) NOT NULL, LAST_NAME VARCHAR(80) NOT NULL, PRIMARY KEY(ID) ); CREATE TABLE DOC_INFO ( ID BIGINT NOT NULL, FILENAME VARCHAR(80) NOT NULL, PATHNAME VARCHAR(255) NOT NULL, CURRENT_REVISION BIGINT NOT NULL, LOCKED_BY VARCHAR(80), PRIMARY KEY(ID), FOREIGN KEY(LOCKED_BY) REFERENCES BLOB_USER ); CREATE INDEX DOCINFOFILEPATHIDX ON DOCINFO(FILENAME, PATHNAME); CREATE INDEX DOCINFOIDLOCKIDX ON DOC_INFO(ID, LOCKED_BY); CREATE TABLE DOCUMENT ( DOC_INFO_ID BIGINT NOT NULL, REVISION_NUMBER BIGINT NOT NULL, DESCRIPTION VARCHAR(512), USER_ID VARCHAR(8O) NOT NULL, REVISION_DATE TIMESTAMP NOT NULL, BYTES BLOB(1OOM) NOT NULL, PRIMARY KEY(DOC_INFO_ID, REVISION_NUMBER), FOREIGN KEY(DOC_INFO_ID) REFERENCES DOC_INFO(ID) ON DELETE CASCADE, FOREIGN KEY(USER_ID) REFERENCES BLOB_USER(ID) ); CREATE INDEX DOC_DOCINFOIDIDX ON DOCUMENT(DOC_INFO_ID); Appendix C CREATE TABLE BLOB_USER ( ID VARCHAR2(80) NOT NULL, PASSWORD VARCHAR2(80) NOT NULL, FIRST_NAME VARCHAR2(80) NOT NULL, LAST_NAME VARCHAR2(80) NOT NULL, PRIMARY KEY(ID) ); CREATE TABLE DOC_INFO ( ID NUMBER NOT NULL, FILENAME VARCHAR2(80) NOT NULL, PATHNAME VARCHAR2(255) NOT NULL, CURRENT_REVISION NUMBER NOT NULL, LOCKED_BY VARCHAR2(80), PRIMARY KEY(ID), FOREIGN KEY(LOCKED_BY) REFERENCES BLOB_USER(ID) ); CREATE INDEX DOCJNFO_FILE_PATH_IDX ON DOC_INFO(FILENAME, PATHNAME); CREATE INDEX DOC_INFO_ID_LOCK_IDX ON DOC_INFO(ID, LOCKED_BY); CREATE TABLE DOCUMENT ( DOC_INFO_ID NUMBER NOT NULL, REVISION_NUMBER NUMBER NOT NULL, DESCRIPTION VARCHAR2(512), USER_ID VARCHAR2(80) NOT NULL, REVISION_DATE DATE NOT NULL, BYTES LONG RAW NOT NULL, PRIMARY KEY(DOC_INFO_ID, REVISION_NUMBER), FOREIGN KEY(DOC_INFO_ID) REFERENCES DOC_INFO(ID) ON DELETE CASCADE, FOREIGN KEY(USER_ID) REFERENCES BLOB_USER(ID) ); CREATE INDEX DOCUMENT_DOC_INFO_ID_IDX ON DOCUMENT(DOC_INFO_ID); Appendix D CREATE DATABASE BLOBDB; USE BLOBDB; GRANT ALL PRIVILEGES ON *.* TO BLOBDB@"10.161.101.33" IDENTIFIED BY 'BLOBDB' WITH GRANT OPTION; CREATE TABLE BLOB_USER ( ID VARCHAR(80) NOT NULL, PASSWORD VARCHAR(80) NOT NULL, FIRST_NAME VARCHAR(80) NOT NULL, LAST_NAME VARCHAR(80) NOT NULL, PRIMARY KEY (ID) ) ENGINE=INNODB; CREATE TABLE DOC_INFO ( ID INT NOT NULL, FILENAME VARCHAR(80) NOT NULL, PATHNAME VARCHAR(255) NOT NULL, CURRENT_REVISION INT NOT NULL, LOCKED_BY VARCHAR(80), INDEX DOC_INFO_LOCKED_BY_IDX (LOCKED_BY), PRIMARY KEY (ID), FOREIGN KEY (LOCKED_BY) REFERENCES BLOB_USER (ID) ON DELETE CASCADE ) ENGINE=INNODB; CREATE INDEX DOC_INFO_FILE_PATH_IDX ON DOC_INFO(FILENAME, PATHNAME); CREATE INDEX DOC_INFO_ID_LOCK_IDX ON DOC_INFO(ID, LOCKED_BY); CREATE TABLE DOCUMENT ( DOC_INFO_ID INT NOT NULL, REVISION_NUMBER INT NOT NULL, DESCRIPTION TEXT, USERID VARCHAR(80) NOT NULL, REVISION_DATE DATETME NOT NULL, BYTES LONGBLOB NOT NULL, INDEX DOCUMENT_DOC_INFO_ID_IDX (DOC_INFO_ID), INDEX DOCUMENT_DOC_INFO_USER_ID_IDX (USER_ID), PRIMARY KEY (DOC_INFO_ID, REVISION_NUMBER), FOREIGN KEY (DOC_INFO_ID) REFERENCES DOC_INFO (ID) ON DELETE CASCADE, FOREIGN KEY (USER_ID) REFERENCES BLOB USER (ID) ) ENGINE=INNODB; Read More

Database Management System - Assignment Example

Extract of sample "Database Management System"

CHECK THESE SAMPLES OF Database Management System Assignment

Problems and Issues of Localising and Distributing Databases Worldwide

Management and Modeling of Oracle Data Modeler

Management Information System

Twelve questions in Management Information System

Specific Aspects of Auditing in a Computer-Based Environment

Ethical Issues of Australian Computer Society

Logics of Database Analysis

Magic Quadrant for Data Warehouse Database Management Systems