I-BITE is intended as an organized forum bringing together European industrial, business and Information Technology actors interested in the application of available and emerging solutions to problems of Interoperability in information systems. We plan to get feedback from the conclusions drawn in I-BITE.
IMPRESS develops a persistent multimedia object-oriented programming language to support sophisticated applications requiring the interoperability of objects and methods possibly located at different sites. MIND project will go beyond IMPRESS by providing semantic interoperation between heterogeneous database management systems.
EDS II is developing a parallel machine whose main application is a parallel database server. The query language of the database server, called ESQL [GV 92], is extending SQL to support in particular complex objects, methods and generalization. The features of this language will aid the design and implementation of the SQL like language to be developed in the MIND project.
FIDE focuses on four main areas: Definition of type systems for bulk types, specification of a canonical object store, design methodology and transaction processing. The experience of the FIDE project will aid us in the design of the canonical data model in the MIND project.
A number of well known research prototypes for tightly coupled (DDTS,MERMAID [SL 90, Tho 90]) and loosely coupled (MRDSM, SISYPHUS [SL 90]) federations were reported. We are focusing further on the loosely coupled federations that provide for mechanisms to specify export schemas for databases participating in a federation represented in a uniform data model (overall global schema is not required). Existing federated multidatabase management architectures and methodologies [Tho 90] emphasize mainly technical (system) problems providing for an application a technical ability to use any collection of the databases from the federation. Usually no application semantics is reflected in the export schemas, therefore the export schemas may not be semantically equivalent to the local schemas of the original databases. So the gap between the industrial solutions leading to technical ability of the resources to interoperate and the semantic interoperability is still very large.
The object model supported by ODMG has object as the basic modeling primitive. Objects can be categorized into types. The behavior of objects is defined by a set of operations that can be executed on an object of the type. The state of objects is defined by a set of properties. These properties may be either attributes of the object itself or relationships between the object and one or more other objects.
In this project to adopt the international standards and to be compatible with the current trends, CORBA (Common Object Request Broker Architecture) [CORBA 92] will be taken as the base of the design. As defined by the Object Management Group (OMG) CORBA is a technology providing a low level support for interoperability in heterogeneous distributed environments. The use of CORBA will make readily available the language mappings required for the implementation of general extensible stub of the interoperable system server and the skeletons for the participating DBMSs. Furthermore, the design and implementation of transaction management facilities [Wei 91] with extensions to federated systems will be simplified through CORBA. Another advantage of this approach will be the availability of guidelines and standards to be followed by the database management systems for participation in the federation, making the process easily acceptable as a de facto standard.
Interoperation of multiple heterogeneous database management systems will be achieved through a canonical data model [BM 93, Kal 90, KAN 93] with extensions for supporting both object-oriented and relational data models and with constructs to enrich component schemas making the semantic integration possible. The canonical data model will mainly be based on the OMG's object model ODMG [ODMG 94].
Based on the canonical data model, the mappings from/to participating database management systems' data model will be provided by the project.
As the query language an object-oriented SQL like language [LA 87, GV 92, SRL 93] based on the canonical data model will be designed and implemented. It will also contain features such as compensating statements [SRL 93] in the presence of participating systems with different or non-existent commit protocols, naming facilities, extended recovery constructs. The design and implementation of query distribution and optimization with its asynchronous nature in federated systems is a challenging problem and is one of the other outcomes of the project. The optimization problem with the constraint of minimizing the overall execution time makes the query distribution problem even harder. In this project not only the minimization of a single query but also the multiple query optimization problem will be addressed.
In distributing queries some modifications on the query are necessary due to structural and semantic differences in the schemas integrated.
1 CORBA compliant level : This stage will encompass the design of a toolbox for the registration of voluntary participations in the federation, entailing concurrency control, authentication, and transaction management.
2 Secure Query and Semantic Interoperation Level: This stage captures the query language design and optimization issues, schema integration and modification of queries and the necessary toolbox.
The design stages as dictated by the previous section will be as follows:
Design and implementation of skeletons in IDL for DBMSs willing to participate into federation, and the design of stubs in IDL for the interoperable system server. Since the design of the federated system is based on CORBA [CORBA 92], the project will initially start by registering one relational and one object-oriented DBMS on top of CORBA, namely INFORMIX and MOOD [ADE 93, Dog 93, Dog 94, DEOO 94, DOBS 94]. For this, the operations on these DBMSs will be defined and kept in the interface repository. Also the stubs for communicating with these DBMS at the interoperable system server site will be available. The general archtitecture of the system is as shown in Figure 1. At this stage the ease of registering other DBMSs to the federation will be tested and tools to aid and to automate this process as much as possible if necessary will be implemented.
Design and Implementation of a Transaction Manager. In this project, transaction management facilities will be supplied to manage distributed transactions on top of CORBA. Existence of DBMSs with different or non-existant commit protocols brings the issue of compansating statements [SRL 93] requiring specification from the application programmer to be provided at the language level. The transaction management facilities supplied by the system will include sub-transaction begin, prepare to commit, commit, rollback and end commands of a simplified version of the TP protocol and the MIND system will have a two level nested transaction model ,namely, local and global transactions. At this level also facilities to be used at the language level for ordering events issued to different sites will be provided. Also the algorithms for global concurrency control will be provided. All transaction management, concurrency control and event ordering facilities will be supplied as a layer on top of CORBA (See Figure 1.).
Design of the canonical data model and definition of the mappings from/to participating DBMSs to achieve semantic interoperation. CORBA by itself is not enough to achieve semantic interoperation since the integrated view of the component schemas is necessary. The canonical data model representation of a component DBMS and its language facilities should be equivalent to its original representation. This is a necessary prerequisite for database update and semantic integration. Comprehensive models and languages are needed to reach the equivalence of description. Positive experience gained from the development of quite general models and languages in related directions, such as Knowledge Interchange Format [GF 92], X3H4 IRDS investigations [Bro 93], and recent work on the well-defined object data models, algebras and calculus [BM 93, KAN 93] make creation of such canonical well grounded model quite feasible. The canonical data model [BM 93, Kal 90, KAN 93] will be based on the ODMG-93 [ODMG 94] data model. It is powerfull enough for supporting both relational and object-oriented technologies. The canonical data model will also supply constructs to specify schema semantics for integration. At this stage, the mappings which are rules for translation between canonical data model and participating DBMSs' data models will be supplied. This will provide both for a uniform view of the participating database management systems and for distributing and translating the queries to the local DBMSs.
Design and implementation of schema exporting tools. The participating DBMSs without sacrificing their autonomy should be able to specify the parts of their database for federated use and also be able to hide its data. The MIND system will provide tools for specifying the to-be exported schema. The exported schema will be translated to the canonical model automatically.
Design and implementation of the schema integrator. At this stage, a view definition mechanism enabling integration of relevant imported schemas for a specific application will first be developed. The view definition mechanism will specify the rules or conversion routines for resolving data and behaviour heterogeneties. The schema integrator tool to resolve semantic heterogeneity assumes that the mappings between data models of participating DBMSs and the ODMG [ODMG 94] based canonical data model has been supplied previously. This tool [SLCN 88, SM 89, BLN 86, HR 90, Ken 91, NS 88, SN 88, FKN 91, NK 89, DH 84, SM 91] will be able to aid the user in acquaring knowledge about underlying data sources. Matching attributes, functions and constraints over these components should be provided by the user resulting from an initial analysis of the relevant component schemas. Later, with respect to the application requirements of a user or user group, the class taxonomy under interest will be generated automatically. The important task to be performed in this step is the classification of all possible conflicts and the definition of methods for resolving them. To reason about whether a database is applicable to a given application (perhaps, after some contextual, structural, behavioral, extensional, etc. reconciliation) should be complete. The basic idea in application view oriented schema integration is to consider each query issued in the query language of our federated system as representing a separate user view. This also permits the grouping of some queries having a similar view of the system. This method of schema integration as proposed allows the classification of user queries with respect to the available views and create new views if necessary dynamically.
Design and implementation of an object-oriented SQL together with distribution and optimization facilities. The system will support an ad-hoc query language namely a SQL like language supporting object-oriented features. Local autonomy of the participating DBMSs and the federated nature of the system dictate the language to have facilities for control over rollbacks and transaction management, ordering of replies from different sites, generic naming facilities for data with or without location transparency. The queries issued will be validated against the integrated schema and modified by the query modifier tool. Query processing will take place after this process. The main challenge in query processing in federated systems stem from autonomy, data distribution and heterogeneity properties. These together makes the query distribution and optimization in federated systems a rather different problem than that in distributed systems. Depending on the characteristics of the underlying hardware of the participating DBMSs, system load, query patterns issued to both the MIND system and to local DBMSs, query optimization strategies, operations supported and statistics attainable from the participants in the presence of autonomy properties of local DBMSs reveal the fact that both query optimization and multiple query optimization to minimize average response time is important in federated systems. In this project a query optimizer for federated systems considering all those problems will be developed and implemented [LST 91, MS 91, YC 84].
Design and implementation of the interoperable system server. This step is a system integration study. At this stage, the project will register all the modules developed so far on top of CORBA and will guarantee interoperability of the participating DBMSs.
The objective of transaction management in an FDBS is to guarantee serializable execution of local and global transactions. Difficulties arise from the necessity to maintain the autonomy of each participating DBMS. Because of local autonomy, each participating DBMS may use a different mechanism for transaction management, which cannot be changed. Furthermore, the control information in each participating DBMS cannot be revealed to the FDBS without the agreement of the participating DBMS [HSL 94].
The transaction management problem in FDBS has attracted a lot of interest from the database community. A number of FDBS transaction management algorithms have been proposed for a failure-free environment. Recently, researchers have addressed the issue of transaction management in a failure-prone environment and a number of proposal have been made [VW 92]. Each proposed algorithm imposes some restrictions affecting different aspects of local autonomy. These restrictions include:
In this project, we adopt the transaction management for federated autonomous systems as developed in [HSL 94]. In their work, the failure problem in FDBS is analyzed and the definition of global serializability is modified.
A view definition mechanism enabling integration of relevant imported schemas of the participating databases for a specific application will be developed in this project. The basic idea in application view oriented schema integration is to consider each query issued in the query language of our federated system as representing a separate user view.
The interest in schema integration techniques [SLCN 88, SM 89, BLN 86, HR 90, Ken 91, NS 88, SN 88, FKN 91, NK 89, DH 84, SM 91] is significantly increasing. The tool to be designed in this project will deliberately separate the schema integration process into parts requiring user intervention and those parts which can be fully automated. This will ease to minimize the human expertise necessary in this process. To address the problems appearing both at the schema level and instance level, the schema integrator tool to be developed will contain features for entity identification, attribute value conflict resolution, and schematic discrepancy realization.
[BM 93] C. Beeri, T. Milo. On the power of algebras with recursion. Proc. of the 1993 ACM SIGMOD Conference, Washington, May, 1993.
[BLN 86] C. Batini, M. Lenzerini, and S. B. Navathe. A comparative Analysis of Methodologies for Database Schema Integration. ACM Computing Surveys, 18(4):232-364, December 1986.
[Bro 93] M. L. Brodie, The Promise of Distributed Computing and the Challenges of Legacy Systems. NATO ASI on Object-oriented database systems, Turkey, August 1993.
[Can 91] P. E. Cannata. The Irresistable Move Towards Interoperable Database Systems. In Proc. 1st Int. Workshop on Interoperability in Multidatabase Systems, April 1991.
[CORBA 92] The Common Object Request Broker: Architecture and Specification. OMG Document Number 91.12.1, Revision 1.1, 1992.
[DH 84] U. Dayal, H.- Hwang. View Definition and Generalization for Database Integration in a Multidatabase System, IEEE Transactions on Software Eng. Vol. 10, No. 6, November 1984.
[DEOO 94] Dogag, A., Evrendilek, C., Okay, T., Ozkan, C., "METU Object- Oriented DBMS", in Advances in Object-Oriented Database Systems, edited by Dogag, A., Vzsu, T., Biliris, A., Sellis, T., pp.172-198, Springer-Verlag, 1994.
[DOBS 94] Dogag, A., Vzsu, T., Biliris, A., Sellis, T., Advances in Object-Oriented Database Systems, Springer-Verlag, 1994.
[Dog 94] Dogac, A., et. al., "METU Object-Oriented Database System", Demo Description, to appear in the Proc. ACM SIGMOD Intl. Conf. on Management of Data, Minneapolis, May 1994.
[Dog 93] A. Dogac, et al. METU object-oriented DBMS. NATO ASI on Object-oriented database systems, Turkey, August 1993.
[FKN 91] P. Fankhauser, M. Kracker, E. L. Neuhold. Semantic vs. structural resemblance of classes. ACM Sigmod Record, v.20, N 4, December 1991, p. 59 - 63.
[GF 92] Genesereth M.R., Fikes R.E. Knowledge interchange format. Report Logic-92-1, Stanford University, June 1992
[GV 92] G. Gardarin and P. Valduriez. ESQL: An Object-Oriented SQL with F-Logic Semantics, In Proc. of Data Engineering Conf., February 1992.
[HR 90] S. Hayne and S. Ram. Multi-User View Integration System (MUVIS): An Expert System for View Integration. pages 402-409, 1990.
[HSL 94] S.-Y. Hwang, J. Srivastava, and J. Li. Transaction Recovery in Federated Autonomous Databases. In Distributed and Parallel Databases, Vol. 2, No. 2, pp. 151-182, April 1994.
[Kal 90] L. Kalinichenko. Methods and Tools for Equivalent Data Model Construction. Proceedings of the EDBT'90 International Conference, Venice, March 1990.
[KAN 93] Klas W., Aberer K., Neuhold E. Object-oriented modelling for hypermedia systems using the VODAK model language. NATO ASI on Object-oriented database systems, Turkey, August 1993.
[Ken 91] J. Kent. Solving Domain Mismatch and Schema Mismatch Problems with an Object-Oriented Database Programming Language. In Proc. of the 17th Int. Conf. on Very Large Data Bases, pages 147-160, 1991.
[LA 87] W. Litwin and A. Abdellatif. An Overview of the Multidatabase Manipulation Language MDSL. Proceedings of the IEEE, 75(5): 621-632, May 1987.
[LS 92] E-P. Lim and J. Srivastava. Query optimization/processing in federated database systems. Technical Report 92-68, Dept. of Comp. Sc., University of Minnesota.
[LSPR 93] E-P. Lim, J. Srivastava, S. Prabhakar, and J. Richardson. Entity Identification Problem in Database Integration. In Proc. of 9th IEEE Data Eng. Conf., 1993.
[LST 91] H. Lu, M.-C. Shan, and K.-L. Tan. Optimization of Multiway Join Queries for Parallel Execution. In Proc. of 17th Int. Conf. on Very Large Data Bases, pages 549-560, 1991.
[MS 91] M. C. Murphy and M.-C. Shan. Execution Plan Balancing. In Proc. of the 7th Int. Conf. on Data Engineering, pages 698-706, 1991.
[NK 89] E. J. Neuhold and M. Kracker. Schema Independent Query Formulation. In Proc. of the 8th Int Conf. on Entity-Relationship Approach, pages 233-247, 1989.
[NS 88] E. J. Neuhold, and M. Schrefl. Dynamic Derivation of Personalized Views. In Proc. of the 14th Int. Conf. on Very Large Data Bases, pages 183-194, August 1988.
[ODMG 94] The Object Database Standard: ODMG-93. Edited by R. G. G. Cattell, Morgan Kaufmann Publishers, 1994.
[SL 90] A. P. Sheth and J. A. Larson. Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Computing Surveys, 22(3): 183-236, September 1990.
[SLCN 88] A. Sheth, J. Larson, A. Cornelio, and S. B. Navathe. A Tool for Integrating Conceptual Schemata and User Views. In Data Engineering, pages 176-183, 1988.
[SM 89] Siegel M., Madnick S.E. Identification and reconciliation of semantic conflicts using metadata. MIT WP N 3102-89 MSA, November 1989.
[SM 91] Siegel M., Madnick S.E. A Metadata Approach to Resolving Semantic Conflicts. In Proc. of the 17th Int. Conf. on Very Large Data Bases, September, 1991.
[SN 88] M. Schrefl and E. J. Neuhold. Class Definition by Generalization Using Upward Inheritance. In Data Engineering, pages 4-13, 1988.
[SRL 93] L. Suardi, M. Rusinkiewicz, W. Litwin. Execution of Extended Multidatabase SQL. In Proc. of 9th Int. Conf. on Data Engineering, 1993.
[Tho 90] Thomas G, et al. Heterogeneous distributed database systems for production use. ACM Computing Surveys, vol. 22, No. 3, September 1990.
[Tro 93] De Troyer O.M.F. On data schema transformation. Ph. D. Thesis, Brabant Univ., 1993.
[VW 92] J. Veijalainen and A. Wolski, Prepared and Commit Certification for Decentralized Transaction Management in Rigorous Heterogeneous Multidatabases, In Proc. of the 8th Int. Conf. on Data Engineering, Feb. 1992.
[Wei 91] G. Weikum. Principles and Realization Strategies of Multi-Level Transaction Management. In ACM TODS Vol. 16 No. 1, 1991.
[YC 84] C. Yu and C. Chang. Distributed Query Processing. In ACM Computing Surveys Vol. 16, pp. 399-433, 1984.