Sign up & Download
Sign in

A relational model of data for large shared data banks. 1970.

by E F Codd
MD computing computers in medical practice ()

Abstract

Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation). A prompting service which supplies such information is not a satisfactory solution. Activities of users at terminals and most application programs should remain unaffected when the internal representation of data is changed and even when some aspects of the external representation are changed. Changes in data representation will often be needed as a result of changes in query, update, and report traffic and natural growth in the types of stored information. Existing noninferential, formatted data systems provide users with tree-structured files or slightly more general network models of the data. In Section 1, inadequacies of these models are discussed. A model based on n-ary relations, a normal form for data base relations, and the concept of a universal data sublanguage are introduced. In Section 2, certain operations on relations (other than logical inference) are discussed and applied to the problems of redundancy and consistency in the user's model.

Cite this document (BETA)

Page 1
hidden

A relational model of data for la...

Information Retrieval P. BAXENDALE, Editor A Relational Model of Data for Large Shared Data Banks E. F. CODD IBM Research Laboratory, San Jose, California Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation). A prompting service which supplies such information is not a satisfactory solution. Activities of users at terminals and most application programs should remain unaffected when the internal representation of data is changed and even when some aspects of the external representation are changed. Changes in data representation will often be needed as a result of changes in query, update, and report traffic and natural growth in the types of stored information. Existing noninferential, formatted data systems provide users with tree-structured files or slightly more general network models of the data. In Section 1, inadequacies of these models are discussed. A model based on n-ary relations, a normal form for data base relations, and the concept of a universal data sublanguage are introduced. In Section 2, certain opera- tions on relations (other than logical inference) are discussed and applied to the problems of redundancy and consistency in the user���s model. KEY WORDS AND PHRASES: data bank, data base, data structure, data organization, hierarchies of data, networks of data, relations, derivability, redundancy, consistency, composition, join, retrieval language, predicate calculus, security, data integrity CR CATEGORIES: 3.70, 3.73, 3.75, 4.20, 4.22, 4.29 1. Relational Model and Normal Form 1 .I. INTR~xJ~TI~N This paper is concerned with the application of ele- mentary relation theory to systems which provide shared access to large banks of formatted data. Except for a paper by Childs [l], the principal application of relations to data systems has been to deductive question-answering systems. Levein and Maron [2] provide numerous references to work in this area. In contrast, the problems treated here are those of data independence-the independence of application programs and terminal activities from growth in data types and changes in data representation-and certain kinds of data inconsistency which are expected to become troublesome even in nondeductive systems. Volume 13 / Number 6 / June, 1970 The relational view (or model) of data described in Section 1 appears to be superior in several respects to the graph or network model [3,4] presently in vogue for non- inferential systems. It provides a means of describing data with its natural structure only-that is, without superim- posing any additional structure for machine representation purposes. Accordingly, it provides a basis for a high level data language which will yield maximal independence be- tween programs on the one hand and machine representa- tion and organization of data on the other. A further advantage of the relational view is that it forms a sound basis for treating derivability, redundancy, and consistency of relations-these are discussed in Section 2. The network model, on the other hand, has spawned a number of confusions, not the least of which is mistaking the derivation of connections for the derivation of rela- tions (see remarks in Section 2 on the ���connection trap���). Finally, the relational view permits a clearer evaluation of the scope and logical limitations of present formatted data systems, and also the relative merits (from a logical standpoint) of competing representations of data within a single system. Examples of this clearer perspective are cited in various parts of this paper. Implementations of systems to support the relational model are not discussed. 1.2. DATA DEPENDENCIES IN PRESENT SYSTEMS The provision of data description tables in recently de- veloped information systems represents a major advance toward the goal of data independence [5,6,7]. Such tables facilitate changing certain characteristics of the data repre- sentation stored in a data bank. However, the variety of data representation characteristics which can be changed without logically impairing some application programs is still quite limited. Further, the model of data with which users interact is still cluttered with representational prop- erties, particularly in regard to the representation of col- lections of data (as opposed to individual items). Three of the principal kinds of data dependencies which still need to be removed are: ordering dependence, indexing depend- ence, and access path dependence. In some systems these dependencies are not clearly separable from one another. 1.2.1. Ordering Dependence. Elements of data in a data bank may be stored in a variety of ways, some involv- ing no concern for ordering, some permitting each element to participate in one ordering only, others permitting each element to participate in several orderings. Let us consider those existing systems which either require or permit data elements to be stored in at least one total ordering which is closely associated with the hardware-determined ordering of addresses. For example, the records of a file concerning parts might be stored in ascending order by part serial number. Such systems normally permit application pro- grams to assume that the order of presentation of records from such a file is identical to (or is a subordering of) the Communications of the ACM 377
Page 2
hidden
stored ordering. Those application programs which take advantage of the stored ordering of a file are likely to fail to operate correctly if for some reason it becomes necessary to replace that ordering by a different one. Similar remarks hold for a stored ordering implemented by means of pointers. It is unnecessary to single out any system as an example, because all the well-known information systems that are marketed today fail to make a clear distinction between order of presentation on the one hand and stored ordering on the other. Significant implementation problems must be solved to provide this kind of independence. 1.2.2. Indexing Dependence. In the context of for- matted data, an index is usually thought of as a purely performance-oriented component of the data representa- tion. It tends to improve response to queries and updates and, at the same time, slow down response to insertions and deletions. From an informational standpoint, an index is a redundant component of the data representation. If a system uses indices at all and if it is to perform well in an environment with changing patterns of activity on the data bank, an ability to create and destroy indices from time to time will probably be necessary. The question then arises: Can application programs and terminal activities remain invariant as indices come and go? Present formatted data systems take widely different approaches to indexing. TDMS [7] unconditionally pro- vides indexing on all attributes. The presently released version of IMS [5] provides the user with a choice for each file: a choice between no indexing at all (the hierarchic se- quential organization) or indexing on the primary key only (the hierarchic indexed sequent,ial organization). In neither case is the user���s application logic dependent on the existence of the unconditionally provided indices. IDS [8], however, permits the fle designers to select attributes to be indexed and to incorporate indices into the file struc- ture by means of additional chains. Application programs taking advantage of the performance benefit of these in- dexing chains must refer to those chains by name. Such pro- grams do not operate correctly if these chains are later removed. 1.2.3. Access Path Dependence. Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data. Application programs developed to work with these sys- tems tend to be logically impaired if the trees or networks are changed in structure. A simple example follows. Suppose the data bank contains information about parts and projects. For each part, the part number, part name, part description, quantity-on-hand, and quantity-on-order are recorded. For each project, the project number, project name, project description are recorded. Whenever a project makes use of a certain part, the quantity of that part com- mitted to the given project is also recorded. Suppose that the system requires the user or file designer to declare or define the data in terms of tree structures. Then, any one of the hierarchical structures may be adopted for the infor- mation mentioned above (see Structures l-5). 378 Communications of the ACM Structure 1. Projects Subordinate to Parts File Segment Fields F PART part # part name part description quantity-on-hand quantity-on-order PROJECT project # project name project description quantity committed Structure 2. Parts Subordinate to Projects File Sqmeut Fields F PROJECT project # project name project description PART part # part name part description quantity-on-hand quantity-on-order quantity committed Structure 3. Parts and Projects as Peers Commitment Relationship Subordinate to Projects File Segment Fields F PART part # part name part description quantity-on-hand quantity-on-order G PROJECT project # project name project description PART part # quantity committed Structure 4. Parts and Projects as Peers Commitment Relationship Subordinate to Parts File Segnren1 Fields F PART part # part description quantity-on-hand quantity-on-order PROJECT project # quantity committed G PROJECT project # project name project description Structure 5. Parts, Projects, and Commitment Relationship as Peers FCZC .%&-,,ZC,,t Ficlds F PART part # part name part description quantity-on-hand quantity-on-order G PROJECT project # project name project description H COMMIT part # project # quantity committed Volume 13 / Number 6 / June, 1970

Readership Statistics

261 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
30% Ph.D. Student
 
21% Student (Master)
 
8% Other Professional
by Country
 
28% United States
 
10% Germany
 
8% United Kingdom

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in