MIND

(METU INteroperable Database Management Systems)

Summary

MIND is an interoperable DBMS under development in SRDC. Some of the publications and information on the researchers involved in the project are accessible via internet.

The project will develop a framework and a set of tools to provide interoperability between existing database management systems, both relational and object-oriented. The project will enable easy cooperative and corporative [Bro 93] working of existing systems. While preserving the autonomy of existing systems, users will also have a notion of a logically single federated system [SL 90, Tho 90, Tro 93]. Integration of component schemas [SLCN 88, SM 89, BLN 86, HR 90, Ken 91, NS 88, SN 88, FKN 91, NK 89, DH 84, SM 91] with minimum user intervention to achieve the most important semantic interoperability, distribution and optimization of queries [LST 91, MS 91, YC 84] in federated systems in the presence of the query optimizers of the participating systems will take special attention and emphasis in the project.

Objectives

Thousands of heterogeneous information resources on various platforms could be immediately made available for many users through the existing computer networks. However, database autonomy and heterogeneity still form a severe bottleneck for the development of effective interoperable information systems. Thus, there is a growing need [Can 91] for tools to maximize the portability, reusability and interoperability of arbitrary computing services while keeping the autonomy of the pre-existing databases in a federated approach. Database providers participating in such federation do not loose control over their data, and ideally do not have to re-engineer their database management systems and applications to allow interoperability. One of the main objectives of this project is to close the gap between industrial developments and existing needs to provide for semantic interoperation among pre-existing and new technological achievements. A reasonable way of realizing this objective is to develop a comprehensive framework for the integration of component schemas of the participating database systems reconciling semantic and structural differences. Provided that a platform allowing for semantic interoperation and easy integration of participating information systems giving a logically uniform and single view of a DBMS exists, the importance of flexible and efficient access and retrieval of information increases. At this stage distribution and optimization of queries in such a platform becomes an important problem which has not been fully explored yet.

Project Contribution to Eureka Aims and Objectives

The overall aim of the project, providing interoperability between existing and new technologies, has been one of the most challenging issues in the field of Database Management Systems. This is also reflected in the dedicated research and development programs which have been started in United States and Japan[]. The specific contributions of this project to the area of information technology will be two fold: There will be a powerfull and extensible commercial product and it will involve the development and definition of standards serving to the needs of interoperability. The project will use innovative technology. The transfer of proved technology and methodologies will speed up the motivation and adaptation of standards in the field of Database Systems.

Dependencies and Relationships (Esprit Projects)

IRO-DB intends to develop a set of tools to achieve interoperability of pre-existing databases and object-oriented databases. It will concentrate on providing a C++ library to access in an integrated way to heterogeneous databases supported by communication protocols to exchange object-oriented SQL commands and objects. Tools to design and dynamically maintain integrated applications on large federations of heterogeneous databases will also be developed in this project. Although the aim of MIND project is the same with the IRO-DB project, in the MIND project world-wide accepted standards such as CORBA [CORBA 92]and ODMG [ODMG 94] will be used.

I-BITE is intended as an organized forum bringing together European industrial, business and Information Technology actors interested in the application of available and emerging solutions to problems of Interoperability in information systems. We plan to get feedback from the conclusions drawn in I-BITE.

IMPRESS develops a persistent multimedia object-oriented programming language to support sophisticated applications requiring the interoperability of objects and methods possibly located at different sites. MIND project will go beyond IMPRESS by providing semantic interoperation between heterogeneous database management systems.

EDS II is developing a parallel machine whose main application is a parallel database server. The query language of the database server, called ESQL [GV 92], is extending SQL to support in particular complex objects, methods and generalization. The features of this language will aid the design and implementation of the SQL like language to be developed in the MIND project.

FIDE focuses on four main areas: Definition of type systems for bulk types, specification of a canonical object store, design methodology and transaction processing. The experience of the FIDE project will aid us in the design of the canonical data model in the MIND project.

Baselines and Rationale of the Project

Motivations

To reach semantic database interoperation, new approaches for integrating existing modelling, methodological and architectural solutions for the components of federated multidatabase systems should be developed. For that, sufficient information to locate relevant databases for the application, to create coherent context for the application and databases chosen, to construct the multidatabase consistent views supporting the application should be provided with the database themselves maximally avoiding the human expertise. The approaches require specific methods for human/machine reasoning leading to the semantic integration of a collection of databases from the federation in frame of a specific application (a new information system). There is a need also for a richer and more extensible uniform data model to meaningfully exchange and merge data and application semantics. Object model is needed to cope with the current trends. The database management systems are being extensively used and have a huge market share. Making these systems interoperable is indispensible in order to make more effective use of data stored. Most of the commercial systems provide dedicated gateways to other DBMSs however these gateways provide very restricted interoperability in the sense that they provide only structural (table-oriented) interfaces while requiring a complete reengineering of the applications.

Distributed Database Systems

Commercially available technology offers inadequate support both for integrated access to multiple databases and for integrating multiple applications into a comprehensive framework. Homogeneous distributed database management systems require a complete change of the organizational structure of the existing database management systems to cope with heterogeneity.

The Federated Approach

The federated approach to multidatabase management [SL 90] evolved for a decade to overcome these limitations. Unlike homogeneous distributed database management systems (DBMS), a federated DBMS adds layers of software on pre-existing heterogeneous DBMSs. These layers provide for syntactically uniform export schemas and data manipulation languages, and also for semantically integrated schemas with global transaction management, and concurrency control over multiple sites. Most importantly, this approach does not violate the autonomy of the pre-existing databases, i.e., database providers participating in a federation do not loose control over their data, and ideally do not have to re-engineer their DBMSs and applications to allow interoperability.

A number of well known research prototypes for tightly coupled (DDTS,MERMAID [SL 90, Tho 90]) and loosely coupled (MRDSM, SISYPHUS [SL 90]) federations were reported. We are focusing further on the loosely coupled federations that provide for mechanisms to specify export schemas for databases participating in a federation represented in a uniform data model (overall global schema is not required). Existing federated multidatabase management architectures and methodologies [Tho 90] emphasize mainly technical (system) problems providing for an application a technical ability to use any collection of the databases from the federation. Usually no application semantics is reflected in the export schemas, therefore the export schemas may not be semantically equivalent to the local schemas of the original databases. So the gap between the industrial solutions leading to technical ability of the resources to interoperate and the semantic interoperability is still very large.

Background and Existing Tools

The MIND project team has a strong background on Object-Oriented technology, database management systems and graphical user interfaces [ADE 93, Dog 93, Dog 94, DEOO 94, DOBS 94].

International Standards

CORBA

The Common Object Request Broker Architecture (CORBA) and Specification defines a framework for different ORB (Object Request Broker) implementations to provide common ORB services and interfaces to support portable clients and implementations of objects. The ORB provides mechanisms by which objects transparently make requests and receive responses [CORBA 92].

ODMG

Object Database Management Group (ODMG) [ODMG 94] has put forward a set of standards for allowing Object Oriented DBMSs to interoperate. For this the data schema, programming language binding, and data manipulation and query languages must be portable.

The object model supported by ODMG has object as the basic modeling primitive. Objects can be categorized into types. The behavior of objects is defined by a set of operations that can be executed on an object of the type. The state of objects is defined by a set of properties. These properties may be either attributes of the object itself or relationships between the object and one or more other objects.

Description of the Project

Project assumes an extensible platform of autonomous database management systems wishing to participate in the federation. Although a set of tools and standards for easy integration and participation of any database management system will be supplied as a part of the outcome of the project, initially INFORMIX and MOOD (METU-Object Oriented Database management system) [ADE 93, Dog 93, Dog 94, DEOO 94, DOBS 94] as being the representatives of traditional and new technologies are planned to be used in the project.

In this project to adopt the international standards and to be compatible with the current trends, CORBA (Common Object Request Broker Architecture) [CORBA 92] will be taken as the base of the design. As defined by the Object Management Group (OMG) CORBA is a technology providing a low level support for interoperability in heterogeneous distributed environments. The use of CORBA will make readily available the language mappings required for the implementation of general extensible stub of the interoperable system server and the skeletons for the participating DBMSs. Furthermore, the design and implementation of transaction management facilities [Wei 91] with extensions to federated systems will be simplified through CORBA. Another advantage of this approach will be the availability of guidelines and standards to be followed by the database management systems for participation in the federation, making the process easily acceptable as a de facto standard.

Interoperation of multiple heterogeneous database management systems will be achieved through a canonical data model [BM 93, Kal 90, KAN 93] with extensions for supporting both object-oriented and relational data models and with constructs to enrich component schemas making the semantic integration possible. The canonical data model will mainly be based on the OMG's object model ODMG [ODMG 94].

Based on the canonical data model, the mappings from/to participating database management systems' data model will be provided by the project.

As the query language an object-oriented SQL like language [LA 87, GV 92, SRL 93] based on the canonical data model will be designed and implemented. It will also contain features such as compensating statements [SRL 93] in the presence of participating systems with different or non-existent commit protocols, naming facilities, extended recovery constructs. The design and implementation of query distribution and optimization with its asynchronous nature in federated systems is a challenging problem and is one of the other outcomes of the project. The optimization problem with the constraint of minimizing the overall execution time makes the query distribution problem even harder. In this project not only the minimization of a single query but also the multiple query optimization problem will be addressed.

In distributing queries some modifications on the query are necessary due to structural and semantic differences in the schemas integrated.

Project Plan

The project will evolve through a bottom up approach and will mainly consist of two main stages.

1 CORBA compliant level : This stage will encompass the design of a toolbox for the registration of voluntary participations in the federation, entailing concurrency control, authentication, and transaction management.

2 Secure Query and Semantic Interoperation Level: This stage captures the query language design and optimization issues, schema integration and modification of queries and the necessary toolbox.

The design stages as dictated by the previous section will be as follows:

Design and implementation of skeletons in IDL for DBMSs willing to participate into federation, and the design of stubs in IDL for the interoperable system server. Since the design of the federated system is based on CORBA [CORBA 92], the project will initially start by registering one relational and one object-oriented DBMS on top of CORBA, namely INFORMIX and MOOD [ADE 93, Dog 93, Dog 94, DEOO 94, DOBS 94]. For this, the operations on these DBMSs will be defined and kept in the interface repository. Also the stubs for communicating with these DBMS at the interoperable system server site will be available. The general archtitecture of the system is as shown in Figure 1. At this stage the ease of registering other DBMSs to the federation will be tested and tools to aid and to automate this process as much as possible if necessary will be implemented.

Design and Implementation of a Transaction Manager. In this project, transaction management facilities will be supplied to manage distributed transactions on top of CORBA. Existence of DBMSs with different or non-existant commit protocols brings the issue of compansating statements [SRL 93] requiring specification from the application programmer to be provided at the language level. The transaction management facilities supplied by the system will include sub-transaction begin, prepare to commit, commit, rollback and end commands of a simplified version of the TP protocol and the MIND system will have a two level nested transaction model ,namely, local and global transactions. At this level also facilities to be used at the language level for ordering events issued to different sites will be provided. Also the algorithms for global concurrency control will be provided. All transaction management, concurrency control and event ordering facilities will be supplied as a layer on top of CORBA (See Figure 1.).

Design of the canonical data model and definition of the mappings from/to participating DBMSs to achieve semantic interoperation. CORBA by itself is not enough to achieve semantic interoperation since the integrated view of the component schemas is necessary. The canonical data model representation of a component DBMS and its language facilities should be equivalent to its original representation. This is a necessary prerequisite for database update and semantic integration. Comprehensive models and languages are needed to reach the equivalence of description. Positive experience gained from the development of quite general models and languages in related directions, such as Knowledge Interchange Format [GF 92], X3H4 IRDS investigations [Bro 93], and recent work on the well-defined object data models, algebras and calculus [BM 93, KAN 93] make creation of such canonical well grounded model quite feasible. The canonical data model [BM 93, Kal 90, KAN 93] will be based on the ODMG-93 [ODMG 94] data model. It is powerfull enough for supporting both relational and object-oriented technologies. The canonical data model will also supply constructs to specify schema semantics for integration. At this stage, the mappings which are rules for translation between canonical data model and participating DBMSs' data models will be supplied. This will provide both for a uniform view of the participating database management systems and for distributing and translating the queries to the local DBMSs.

Design and implementation of schema exporting tools. The participating DBMSs without sacrificing their autonomy should be able to specify the parts of their database for federated use and also be able to hide its data. The MIND system will provide tools for specifying the to-be exported schema. The exported schema will be translated to the canonical model automatically.

Design and implementation of the schema integrator. At this stage, a view definition mechanism enabling integration of relevant imported schemas for a specific application will first be developed. The view definition mechanism will specify the rules or conversion routines for resolving data and behaviour heterogeneties. The schema integrator tool to resolve semantic heterogeneity assumes that the mappings between data models of participating DBMSs and the ODMG [ODMG 94] based canonical data model has been supplied previously. This tool [SLCN 88, SM 89, BLN 86, HR 90, Ken 91, NS 88, SN 88, FKN 91, NK 89, DH 84, SM 91] will be able to aid the user in acquaring knowledge about underlying data sources. Matching attributes, functions and constraints over these components should be provided by the user resulting from an initial analysis of the relevant component schemas. Later, with respect to the application requirements of a user or user group, the class taxonomy under interest will be generated automatically. The important task to be performed in this step is the classification of all possible conflicts and the definition of methods for resolving them. To reason about whether a database is applicable to a given application (perhaps, after some contextual, structural, behavioral, extensional, etc. reconciliation) should be complete. The basic idea in application view oriented schema integration is to consider each query issued in the query language of our federated system as representing a separate user view. This also permits the grouping of some queries having a similar view of the system. This method of schema integration as proposed allows the classification of user queries with respect to the available views and create new views if necessary dynamically.

Design and implementation of an object-oriented SQL together with distribution and optimization facilities. The system will support an ad-hoc query language namely a SQL like language supporting object-oriented features. Local autonomy of the participating DBMSs and the federated nature of the system dictate the language to have facilities for control over rollbacks and transaction management, ordering of replies from different sites, generic naming facilities for data with or without location transparency. The queries issued will be validated against the integrated schema and modified by the query modifier tool. Query processing will take place after this process. The main challenge in query processing in federated systems stem from autonomy, data distribution and heterogeneity properties. These together makes the query distribution and optimization in federated systems a rather different problem than that in distributed systems. Depending on the characteristics of the underlying hardware of the participating DBMSs, system load, query patterns issued to both the MIND system and to local DBMSs, query optimization strategies, operations supported and statistics attainable from the participants in the presence of autonomy properties of local DBMSs reveal the fact that both query optimization and multiple query optimization to minimize average response time is important in federated systems. In this project a query optimizer for federated systems considering all those problems will be developed and implemented [LST 91, MS 91, YC 84].

Design and implementation of the interoperable system server. This step is a system integration study. At this stage, the project will register all the modules developed so far on top of CORBA and will guarantee interoperability of the participating DBMSs.

System Architecture

The general archtitecture of the system is as shown in Figure 1 below.

Figure 1. General Architecture of the System

Canonical Data Model

The canonical data model to be used in the project for the Federated Database Management System will be based on ODMG-93 [ODMG 94]. The ODMG-93 differentiates mutable or immutable entities. Besides attributes and operations, relationships between mutable object types are also realized in this model. In this model, there is a built-in type hierarchy and support for a nested transaction model with dynamic scoping of rules as well. It also allows the specification of keys and extensions. To define interfaces to object types conforming to the ODMG- 93 Object model the Object Definition Language (ODL) is used. ODL facilitates portability of database schemas across conforming database systems. ODL is the declarative portion of C++ ODL/OML. The C++ binding of ODL is expressed as a class library and an extension to the standard C++ class definition grammer. The class library provides classes and functions to implement the concepts defined in the ODMG object model [ODMG-94]. The extension consists of a single additional keyword and associated syntax that add declarative support for relationships to the C++ class declaration. OML stands for Object Manipulation Language. It is the language used for retrieving objects from the database and modifying them. The C++ OML syntax and semantics are those of standard C++ in the context of the standard class library. Object Data Management Group also describes an Object Query Language named OQL supporting the ODMG data model [ODMG 94]. The ODMG-93 data model will be enriched with constructs for entity identification and attribute value conflict resolution necessary in schema integration.

Transaction Management

In contrast to traditional homogeneous distributed database systems, there are two kinds of transactions in Federated Database Systems (FDBS), namely local transactions and global transactions. A local transaction is submitted to and executed by the local participating DBMS without involving the FDBS. A global transaction, which accesses data controlled by more than one DBMS, is submitted to the FDBS and decomposed into several subtransactions to be executed by different local DBMSs [HSL 94].

The objective of transaction management in an FDBS is to guarantee serializable execution of local and global transactions. Difficulties arise from the necessity to maintain the autonomy of each participating DBMS. Because of local autonomy, each participating DBMS may use a different mechanism for transaction management, which cannot be changed. Furthermore, the control information in each participating DBMS cannot be revealed to the FDBS without the agreement of the participating DBMS [HSL 94].

The transaction management problem in FDBS has attracted a lot of interest from the database community. A number of FDBS transaction management algorithms have been proposed for a failure-free environment. Recently, researchers have addressed the issue of transaction management in a failure-prone environment and a number of proposal have been made [VW 92]. Each proposed algorithm imposes some restrictions affecting different aspects of local autonomy. These restrictions include:

The objective of transaction execution in centralized or distributed database systems is to achieve serializibility. However, transaction execution in a failure-prone FDBS environment is different from that in centralized or distributed database systems. To achieve atomic commitment of transactions in an FDBS environment, where each participating DBMS may not provide a visible prepared state, the 2PC protocol can be simulated by providing a server on top of each pre-existing DBMS. However, unlike the real 2PC protocol in distributed database systems, the simulated 2PC protocol in an FDBS has to address the problem that a subtransaction may be aborted by the local DBMS when it is considered by the server to be in prepared state. In traditional serializibility, only those operations which belong to committed transactions in a history are considered. However, in a failure-prone FDBS environment, the read operations of a transaction which is aborted by the local DBMS when it is in prepared state should also be taken into account. This suggests that global serializability should be modified to consider the effect of the read operations of the aborted but prepared transactions in a failure-prone FDBS environment. This revised global serializability must be satisfied by any FDBS concurrency control algorithm to work correctly under a failure-prone environment [HSL 94].

In this project, we adopt the transaction management for federated autonomous systems as developed in [HSL 94]. In their work, the failure problem in FDBS is analyzed and the definition of global serializability is modified.

Schema Integration

A powerfull schema integration methodology is the key to federated systems. In MIND project, we propose to design and implement a tool for schema and instance level integration [LSPR 93].

A view definition mechanism enabling integration of relevant imported schemas of the participating databases for a specific application will be developed in this project. The basic idea in application view oriented schema integration is to consider each query issued in the query language of our federated system as representing a separate user view.

The interest in schema integration techniques [SLCN 88, SM 89, BLN 86, HR 90, Ken 91, NS 88, SN 88, FKN 91, NK 89, DH 84, SM 91] is significantly increasing. The tool to be designed in this project will deliberately separate the schema integration process into parts requiring user intervention and those parts which can be fully automated. This will ease to minimize the human expertise necessary in this process. To address the problems appearing both at the schema level and instance level, the schema integrator tool to be developed will contain features for entity identification, attribute value conflict resolution, and schematic discrepancy realization.

Query Processing and Optimization

Manipulation of data located in different databases require functions that do not exist in currently used query languages such as SQL [SRL 93]. New features that should be provided for the manipulation of FDBSs include the following [SRL 93]: The main challenges in processing federated database queries originate from the data distribution, heterogeneity and autonomy. The impact of data distribution has been well studied in distributed database research but not the impact of heterogeneity and autonomy. It is therefore important to study how the problems arising from heterogeneity and autonomy can be appropriately handled in federated query processing [LS 92]. The major differences between query processing and federated query processing include: