Data Structure - Tree-Based Structures and Hash Tables Coursework Example | Topics and Well Written Essays

ABSTRACT Data structure is the fundamental building block of databases. The proper selection of a data structure helps users access and manipulate records in a database efficiently in less time. The most common and widely used data structure for databases are Tree-based structures such as B-Trees and Hash Tables. These structures are helpful particularly in organizing data with hierarchical relationship between different entities since every segment of the ‘Tree” structure represents an entity of a real life systems. A B-Tree data structure is ideal for saving huge amounts of data that requires frequent manipulation such as insertion, deletion, and search. However, B-Tree’s performance significantly depends on the number of nodes from the root to leaf and therefore spend considerable time particularly in search operations. Hash Tables provide the best and quickest key search as it allows complete random access to every in the tree. It conserved memory by eliminating pointers and gives comparatively direct access to the data stored. They usually use less space than other tree-based indexes. However, Hash Table has its own limitations, as they are less predictable and vulnerable to sporadic data collision. 1. Introduction Physical database design is really motivated by data volume. A database with a few rows of data really has no issues with physical design database design, and the performance of applications that access a tiny database cannot be deeply affected the physical design of the underlying system. However, as data volumes rise, the physical structures that underlie its access patterns become increasingly critical. Since efficiency is measured by counting the total number of disk accesses made, a good data structure is essential. The focus of this paper is to explore the kinds of data structure used to store information in a database and these data structures’ advantages and disadvantages. 2. Description of the Subject 2.1 Data Structure for Database Data structure is the central element of databases. It is the product of the application of certain tools and techniques developed to link data items within records and between records of the same file and between records of various files. The right choice and design of data structure facilitate efficient and effective access and manipulation of records in a database (Panneerselvam 40). The data structures connecting records of various files and connecting the records of a file for a given record of a different file can be categorized as stack data structure, queue data structure, sorted list data structure, ring data structure, inverted list data structure, multi-list data structure, and tree data structure. Although some provide multidimensional data structures such as variants of quad trees and R-trees and a few also support bit vectors and multi-table join indexes, most database systems support B-Trees, and Hash tables. In practice, these two data structures are most often used in databases (Shasha and Bonnet 82). 2.2 The Tree Structure A ‘Tree’ is a unique breed of network in which there is no cycle. It is helpful in organizing data particularly in circumstances when there is hierarchical relationship between several entities. The tree structure is in fact the inverted image of an actual tree where the root, at the top, is at level zero. Except for terminal nodes, there are a set or ‘fillial set’ of offspring or siblings at every node in the tree and each terminal node is considered a ‘leaf’. When an access path in a tree structure commenced from the root node, it constantly moves downward until it encounter a terminal node. When a database employ a tree structure, the set of nodes at each level will correspond to a segment thus the quantity of segments in a tree will be similar to the number of levels in a simple tree. Since a multiple tree may have one or more than one segment at some levels of the tree, there are more segments in a multiple tree. Every path beginning from the root node to each terminal node or leaf is regarded as a record in the tree data structure since every segment of the tree symbolizes an entity of a real life system. However, the number of files in a tree-based database is only one. There are two types of tree data structure, a binary tree and B-tree or a Balanced Tree. The maximum number of siblings in the binary tree for a non-leaf node is two. This means that the maximum number of branches originating from any non-leaf node is two. The number of branches originating from a node in a B-Tree is limitless. However, on average, the number is roughly similar for non-leaf nodes, which allow access to any record in a tree within a specified number of accesses. In reality, to retrieve all keys in one of the terminal nodes, a user can resolve the size of the filial set and the number of levels of the B-Tree for a given number of accesses backward (Panneerselvam 61). 2.3 The B-Tree Data Structure B-Tree is an effective tree data structure that makes fast access in virtually all database systems possible (Tucker 19). A B-tree data structure is perfect for saving huge amounts of information that requires insertion, deletion, and fast query. As we mentioned earlier about the ‘Tree’ data structure, a B-tree has a tree like structure in which all of the bottommost nodes or leaf nodes are the same number of levels away from the top or root of the tree. This property is preserved even when new data is added or deleted from the indexed column (Petkovic 213). However, there are principles that developers manipulating a B-tree structure must realize. They must understand the reading a tree structure in memory is much faster than reading a similar structure on a disk. In addition, they must realize that it best to read the data in 4K pages when performing read and write operations on a disk. To minimise the number of disk operations, it is therefore necessary to optimise the arrangement of your tree (McBee and Gerber 117). “B-tree is a balance tree whose nodes contain a sequence of key-pointer pairs” where the keys are arranged by value (Tucker 9). B-trees are self-organizing through operations recognized as ‘splits’ and ‘merges’ although there are occurrence of sporadic restructuring to lessen the number of seek. Additionally, B-Trees support various types of query. The effective functioning of a B-tree rely significantly on the number of nodes in the typical path from root to leaf, as access to disk secondary memory spend about 5ms in ‘seek’ operation. The number of nodes in the path is known as the number of levels (Tucker 9). A technique use by DBMS to lessen the number of levels in many B-Tree implementations is to make each internal node have as many offspring as possible. A ‘fan-out’ is the maximum number of children a node can contain and since a B-tree node comprised of key-pointer pairs, the larger the key is, the lesser the fan-out. For instance, a B-tree with a million records and a fan-out of 10 requires seven levels. Thus, if we expand the number of records to a billion then number of levels increases to four and ten, correspondingly. Generally, accessing data using small keys are faster than through indexes with large keys (Tucker 9). 2.4 The Hash Table Structure Dissimilar from a B-Tree structure, Hash structures are method of storing key-value pairs based on a pseudo-randomizing functions called a hash function. The hash function can be though of as the root of the structure. 1 2 3 I(3) 4 5 I(5) 6 Using a key, the hash function seek and returns a location that contains either a page address or a directory location holding a set of page addresses. That page holds the key and related record or is the first page of linked list of pages, known as the ‘overflow chain’ pointing to the records holding the key. When an overflow chain is not available, a hash structure can respond to similarity queries in a single disk access, making it the ideal data structures in equality database search. The hash function will return randomly a range of locations on key values that are close but dissimilar, for instance ‘Smith’ and ‘Smythe’. Consequently, records holding such close keys probably are located on a different page. This is the reason why hash structures are “completely helpful for range and min-max queries” (Tucker 9). 3. Critical Appraisal Generally, application software manages data and since data has to be saved, there are three levels of storage. The main memory is the primary storage, which the computer CPU deals directly. However, although access data stored in this level is quick, storing large information on memory is not practical since it is volatile and more expensive. Although, access to a magnetic disk is slower, it is commonly used as a secondary storage because it is non-volatile and less expensive. Tapes are used as a tertiary storage particularly on data that do not require frequent access (Mehta and Sahni 4). Normally, data has to read from disk to memory and the data stored on disk are in units called blocks or pages. A disk has to read or write one or multiple blocks whenever we access a database. Consequently, we need to read the whole block even if we only need to retrieve a single integer stored in a disk block that probably contains thousands of integers. In a database, the basic issue is reducing the time it takes to retrieved information from a disk and since efficiency is measured by counting the total number disk access, organizing data on a disk is such a way that they can efficiently be updated and retrieved is very important (Mehta and Sahni 4). As the file increase, we can expect the degradation of ordered indexes performance. However, sporadic restructuring of the entire file can solve this difficulty to some extent but it is not practical since the system need to be unavailable while the file is being restructured. Realistic approaches to this kind of problem are the ‘Tree’ data structures and the basic types of tree data structures are the B+-Tree and B-Tree (Khosrowpour 674). When data volume is great and does not fit in memory, “an extension of the binary search tree to disk-based environment is the B-Tree” (Mehta and Sahni 1). Because each disk access requires transfer of whole block of information between memory and disk instead of several bytes, a node of the B-Tree is developed to hold two child pointers or more until it reaches the maximum block capacity. However, with the exception of the root, the B-Tree requires that every node have to be in any case half-full to perform the operation effectively in less time (Mehta and Sahni 1). For instance, supposed that we need to locate a record in a large database using object-oriented programming, the accustomed technique is to use a chain of ‘if’ statements to evaluate each record in the database with the object preference. Performing such comparison in a large database may result in performance issues since it will certainly take considerable time to retrieve the required information. If this is the case, the data structure that will best meet our needs is a hash table. This is because on average, only about one comparison is required to find an object in a hash table. “Hashing is one of many addressing techniques used to find a record based on its unique key value (Gudukbay 236)”. However, the disadvantage of a hash table is that when you fetch objects successively, the order of the objects is not foreseeable. Furthermore, the time it takes to find an object in most other data structures varies with number of objects in the data structure (Grand and Merrill 427). Hash tables have the advantage of allowing complete random access to every node in the tree provided the maximum node depth is kept low. Hash table conserved memory by eliminating pointers. Another advantage of hash tables is that they offer the opportunity to store a non-tree structure (Williams 88). Hash table addressing gives comparatively direct access to the data stored and it can deal with locally adapted data in a simple way (D’Hollander et. al. 589). While hash-based indexes normally provide some of the best and quickest key search, they are somewhat inflexible and unpredictable than other indexes. They are less flexible because range-based queries cannot use the index. Good hash function produces very unusual values for similar values, thus the server cannot make any inference regarding the sequence of data within the index structure. Moreover, in the hash table, records that are close to each other are seldom similar. Hash indexes are less predictable because the wrong combination of data and hash function can result in “a hash table in which most of the records are clumped into just a few buckets” (Zawodny and Balling 70). In this kind of situation, performance degrades significantly because the computer must scan a large list instead of sorting through a relatively small list of keys that share identical hash value, (Zawodny and Balling 70). Hashing is not precise since collision occurs sporadically when two dissimilar keys hash into the same hash value and are assigned to the same array element (Keogh and Davidson 219). However, hash indexes operates reasonably fine for most text and numeric data types and since hash functions efficiently diminish unreasonably sized keys to a small hash value, they normally use less space than many tree-based indexes (Zawodny and Balling 70). 4. Conclusion Databases need a well-organized data structure to effectively access and manipulate records stored in them. Although there are number of data structures types, most databases support and often use Tree-based data structures such as B-Trees and Hash Tables. A B-tree data structure is ideal for storing large amounts of data that often needs deletion, insertion, and quick retrieval. However, the performance of B-Trees depends on the number of nodes in the average path from root to the leaf thus the more nodes it has the slower it performs. An alternative structure is a hash table where records are search using a unique key value. Unlike B-Trees, it allows complete random access to every node in the tree and conserves memory because it does not require pointers. However, hashed-based indexes are not flexible and unpredictable than other indexes. Hash is not precise since collision of two dissimilar key hashes is very likely. However, since hash eliminates large keys and normally uses less space than other tree-based structures, it is probably the most practical data structure for databases. 5. Reference Cormen Thomas. Introduction to Algorithms.U.S: MIT Press, 2003 D'Hollander, Joubert G. R., Völpel R., Peters F. J., and Trottenberg U. Parallel Computing: Fundamentals, Applications and New Directions. Netherlands: Elsevier, 1998 Grand Mark and Merrill Brad. Visual Basic Design Patterns: Reusable Design Patterns Illustrated. U.S.: John Wiley and Sons, 2005 Gudukbay U. Advances in Computer and Information Sciences '98. Turkey: IOS Press, 1998 Haines Steven and Potts Stephen. Java 2 Primer Plus. U.S.: Sams Publishing, 2002 Keogh James and Davidson Ken. Data Structures Demystified: A Self-Teaching Guide. U.S.: McGraw-Hill Professional, 2004 Khosrowpour Mehdi. Innovations Through Information Technology: 2004 Information Resources Management Association International Conference, New Orleans, Louisiana, USA, May 23-26, 2004, U.S.: Idea Group Inc (IGI), 2004 McBee Jim and Gerber Barry. Microsoft Exchange Server 2003: 24 Seven. U.S.: Wiley, 2004 Mehta Dinesh and Sahni Sartaj. Handbook of Data Structures and Applications. U.S.: CRC Press, 2005 Panneerselvam R. Database Management Systems. India: PHI Learning Pvt. Ltd., 2004 Petkovic Dušan. SQL Server 2000: A Beginner's Guide. U.S.: McGraw-Hill Professional, 2000 Roman Steven. Access Database Design and Programming. U.S.: O'Reilly, 2002 Shasha Elliott Dennis and Bonnet Philippe. Database Tuning: Principles, Experiments, and Troubleshooting Techniques. U.S.: Morgan Kaufmann, 2003 Tucker Allen. Computer Science Handbook: Handbook. U.S.: CRC Press, 2004 Williams Ross Neil. Adaptive Data Compression. U.S.: Springer, 1990 Zawodny Jeremy and Balling Derek. High Performance MySQL: Optimization, Backups, Replication, and Load Balancing. U.S.: O'Reilly, 2004 Read More

Data Structure - Tree-Based Structures and Hash Tables - Coursework Example

Extract of sample "Data Structure - Tree-Based Structures and Hash Tables"

CHECK THESE SAMPLES OF Data Structure - Tree-Based Structures and Hash Tables

A Qualitative Structure Activity Relationship

The Limitation and Strength of the Database

Structured Equity Derivatives

Relational Database Management System

A Knowledge Management Framework for Expert Decision Making

Comparing Vectors and Lists in R

Information Systems in Business - Wal-Mart

AutoCAD Drawing of Final Design Choice