A database is a collection of data or information. Database management system (DBMS) is a program that manages data in a database. It is a computerized record-keeping system that stores, maintains, and provides access to information. A database could be as simple as a phone book or stock tables, or as sophisticated as a biological repository with terabytes of data. Relational DBMS are those that follow the relational data model described by E.D. Codd. Object Oriented Database (OODBMS) refer to those that store objects directly, or use –mapping technology to store objects instead of simple data entities. A database system involves four major components:
The primary purpose of a DBMS is to allow a user to store, update, and retrieve data in a abstract terms and thus make it easy to maintain and retrieve information from a database. A DBMS relieves the user from having to know about exact physical representations of data and having to specify detailed algorithms for storing, updating and retrieving data. A DBMS is usually a very large software package that carries out many different tasks including the provisions of facilities to enable the user to access and modify information in the database. The database is an intermediate link between the physical database, the computer and the operating system, and on the other hand, the users. To provide various facilities to different types of users, a DBMS normally provides one or more specialized programming languages often called database languages. Different specialized programming languages often called database languages. Different DBMS provide different database languages. However, structured query language (SQL) is the de factor standard. Database languages come in different forms. A language is needed to describe the database to the DBMS as well as provide facilities for changing the database and for defining and changing physical data structure. Another language is needed for manipulating and retrieving data stored in the DBMS. These languages are called Data Definition Language (DDL) and Data Manipulation Language (DML) respectively.
The latest development in the field of database management systems relate to the development of object oriented relational databases or OORDBMS. This is based on the generally established concept of object oriented analysis and design or OOAD and provide for simplification of data storage, access, and retrieval as well as effective manipulation.
Oracle Corporations, IBM as well as Microsoft are the leading vendors of DBMS and have a range of products to suit varied organizational requirements.
Data mining or Knowledge Discovery in Databases (KDD), as it is also known, in the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. This encompasses a number of different technical approaches, such as clustering, data summarization, learning classification rules, finding dependency net works, analyzing changes, and detecting anomalies. Data mining is concerned with the analysis of data and the deployment of software techniques for uncovering patterns and regularities in sets of data. The idea is that it is possible to discover patterns and relationships in unexpected places as the data mining software extracts patterns not previously discernable or so obvious. Data mining is the search for relationships and global patterns that exist in large databases but are hidden among the vast amount of data, such as a relationship between temperature of a room and the productivity of an employee. These relationships represent valuable knowledge abut the database and the objects in the database relating to an organization or the internal or external environment.
Data mining analysis tends to work upwards from the available data and the best techniques are those developed with an orientation towards large volumes of data, making use of as much of the collected data as possible to arrive at reliable conclusions and decisions. The analysis process starts with a set of data, uses a methodology to develop an optimal representation of the assumption that the larger data set has a structure similar to the sample data. Again this is analogous to a mining operation where large amounts of low grade materials are shifted through in order to find something of value. The mining process begins with the raw data and terminated with the extracted knowledge.
The past two decades has seen a dramatic increase in the amount of information or data being stored in electronic format. This accumulation of data has taken place at an explosive rate. It has been estimated that the amount of information in the world doubles every 20 months and the size and number of databases are increasing even faster. The increase in use of electronic data gathering devices such as point-of- sale or remote sensing devices has contributed to this explosion of available data. There was also the introduction of new machine learning methods for knowledge representation based on logic programming etc. in addition to traditional statistical analysis of data. The new methods tend to be computationally intensive, hence, a demand for more processing power. It was recognized that information is at the heart of business operations and that decision-makers could make use of the data stored to gain valuable insight into the business. DBMS provide access to the data stored but this was only a small part of what could be gained from the data. Traditional online transaction processing systems are good at inserting data into databases quickly, safely and efficiently but are not good at delivering meaningful analysis in return. Analysing data can provide further knowledge about a business by going beyond the data explicitly stored to derive knowledge about the business. Data mining or knowledge discovery in databases provides an organization with highly benefits in the area of analysis.