Relational database

E. F. Codd

Journal ArticleOPEN ACCESS

Relational database

Codd E

Communications of the ACM (1982) 25(2) 109-117

DOI: 10.1145/358396.358400

N/ACitations

63Readers

Abstract

The verb of the predicate has become a relation name, KILLED, and the variables x and y have become attribute names, K I L L E R and VICTIM, defined in the relation schema of this relation. Associated with each attribute name, but not shown in the above representation , is an underlying domain, the set of permissible values for the attribute in question. In this case, both attributes would draw their values from the same domain, "names of Shakespearean characters." A particular instantiation of a predicate in n variables is represented by an n-tuple. Thus, the 2-tuple (Brutus, Caesar), in combination with the relation schema of KILLED, represents the proposition "Brutus killed Caesar." Arising from the visual representation of a relation are several informal terms in common use: + Table, for relation. + Heading for relation schema. + Column (name) for attribute (name). + Row for (n-)tuple. + Body (or extension) for the set of tuples "in" the relation. Four important principles are illustrated in the above example: At each intersection of a row and column there is exactly one value. This is the principle of first normal form, fundamental in the relational model. While in natural language we might say "Hamlet killed Laertes and Polonius," the relational model does not allow us to put Laertes and Polonius in the same row and so requires us to say "Hamlet killed Laertes" and "Hamlet killed Polonius." The order in which the rows are written is unimportant. The information conveyed-the single proposition formed by inserting the word "and" between the rows-is the same regardless of the order. The order in which the columns are written is also unimportant. It is only important to know, for each value in a row, to which column that value pertains , and we achieve that by writing the value underneath the name of its column. Writing the same row more than once is as redundant as would be writing the same proposition twice with the word "and" in between. Such redundancy can only confuse. For instance, if we had (Brutus, Caesar) twice, we would have to be very careful how we phrase the query that asks "How many people did Brutus kill?"-the Rela-tional Model expressly prohibits duplicate rows. A relational database is a collection of relations (more precisely, relation variables, to allow changes in the "contents" to reflect changes in the state of the enterprise , while the relation schemas do not change). A relational database schema is a collection of relation schemas, along with a collection of domain definitions , with the possible addition of integrity rules (usually known as constraints), access authorizations, and so on. A relational database management system (DBMS) must minimally provide for the definition of domains and relation schemas; the insertion, updating, and deletion of tuples; and a relational query language for defining new relations that may be derived from the "base relations" of the database. As of mid-1999, no well-known commercial product quite matches up to these stated requirements of a relational DBMS. Those based on the standard database language SQL (Struc-tured Query Language) are commonly called relational DBMSs. However, SQL's concept of "tables," though similar to that of relations, turns out on close scrutiny to deviate in several important respects from the Rela-tional Model. Further, SQL DBMSs have been particularly lacking in the area of domains. By this we do not mean failure to support the relational concept of domains, the concept now often referred to as data types (q.v.) or object classes (see CLASS), where a domain (or type or class) is a named set of values accompanied by a set of operators for operating on those values. Rather, we mean that SQL supports only a specific and very limited collection of domains, these being the data types (to use the SQL term) that it provides for representing and operating on numbers, character strings, dates, times and typically nothing else. Much work is currently under way to address this deficiency by providing comprehensive support for user-defined data types of arbitrary complexity. (Unfortunately, the current international standard for SQL does define a construct that goes by the name "domain," but this is not the concept referred to by that term in relational theory.) A relational query language is one that embodies the fundamental principle that the operands and the result of any operator in the query language are relations. If query operations are thus closed over relations, then queries of arbitrary complexity can be expressed. In practice, to achieve this end, relational query languages are founded on either or both of the relational algebra and the relational calculus proposed by Codd. Of these two, the algebra is considered, psychologically , to be the "lower-level'' system (in the same sense in which programming languages are often described as "low-level" or "high-level"), but in fact the two

Cite

CITATION STYLE

APA

Codd, E. F. (1982). Relational database. Communications of the ACM, 25(2), 109–117. https://doi.org/10.1145/358396.358400

Relational database

Abstract

Cite

Register to see more suggestions