SQL databases (relational databases)
A Relational Database, also under the acronym (RDB), refers to a specific type of database based on the relational model, as proposed in the 1970s by the English computer scientist Edgar Frank Codd.
Under the relational model paradigm, we can define a relational database (RDB) as a database (DB) that stores and allows access to sets of data related to each other under certain clear, specific and well-defined rules.
A Relational Database(RDB) uses tables to organize data, those tables are structured in rows (records) and columns (fields). Tables can be related to each other by a common field which is usually known as ID or Key.
Before delving into the relational model and the different types of relationships is it necessary to correctly understand what is a database table and its components. Once these concepts are clarified, it will be easier to understand how data relationships are established.
What is a table in a database?
Tables are database objects or structures whose main purpose is to store data. Tables contain rows and columns also known as records and fields. Let’s delve into what each of them are:
-
Fields: The fields of a table correspond to the name of each column. They must be unique and have an associated data type. (
VARCHAR, INTEGER, DATE, BIT
…) -
Record: a record, also called a row or tuple, is the set of fields for each row in the table. To put it more simply, we can say that each row of data in a table makes up a record. Each record must be unique and not duplicated, which means that at least one of its fields must never be repeated. This data will generally correspond to the identifier or Primary Key that we will explain later.
To understand these concepts, let’s assume that we are working with a simple table named students with the following fields and records:
student_id | name | last name |
---|---|---|
1 | Noah | Williams |
2 | Emma | Smith |
3 | Liam | Jones |
Looking at this example we can draw the following conclusions:
-
The students table has three fields
student_id
(INT identity),name
(VARCHAR) andlast name
(VARCHAR)). -
The students table consist of three records or rows (1 Noah Williams forms the first record, 2 Emma Smith the second, and so on.)
-
The
student_id
field is it is unique and unrepeatable for each record. This field will be defined as Primary Key.
To better understand how tables work and relate to each other, it is necessary to understand the different types of keys that a table can have.
Keys
A Primary Key is a key formed by a table field (or combination of fields) designated to uniquely identify each table record in the database. A Primary Key must contain an unique value for each row of data and it can’t never contain null values.
Primary Key
In the students table example, the student_id
field meets all the Primary Key requirements. It does not have a null value, it is not repeated and it allows each record to be uniquely identified.
Unique Key
An Unique Key as in the previous case is also a set of one or more than one fields/columns of a table that uniquely identify a record in a database table. It is mainly used to prevent duplicate values from being inserted into a specific column or combination of columns that are part of the UNIQUE
constraint and are not part of the Primary Key.
Unlike the Primary Key, the Unique Key allows us to generate indexes that include null values in some of its fields. In addition, it improves data recovery performance through faster and more efficient searches.
If we include one more field, wich does not allow nulls, named
driversLicense
in thestudents
table from the example, if we wanted, we could use this new field as Unique Key since it is not repeated and uniquely identifies each record. We could also determine our Unique Key as the combination of the fieldslastName
anddriversLicense
. That way even iflastNSame
allowed nulls, we could continue to identify our records only through thedriversSlicense
field.
studentId | name | lastName | driversLicense |
---|---|---|---|
1 | Noah | Williams | PXXXXXXX |
2 | Emma | Smith | MXXXXXXX |
3 | Liam | Jones | AXXXXXXX |
Foreign Key
A Foreign Key is a column or group of columns in a relational database table that provides a link between data in two tables. Said key is linked directly to the Primary Key of the main table with which the relationship is established. This means that the Foreign Key can only contain values that exist in the Primary Key of the related table, ensuring that the references between both keys are maintained.
The Foreign Key controls and prevents a Primary Key from being modified and also prevents deleting a record if a Foreign Key exists in another table that refers to that record. This inviolable rule is known as referential integrity, and it ensures that there is always cohesion and coherence between the data in the different tables of a database.
Referential integrity is one of the most important properties of relational databases and ensures both data integrity and relationship synchronization in delete and update operations.
To explain the Foreign Key in a more graphic way, let’s suppose that apart from the students
table we have a second table called academicRecord
:
academicRecordId | status | date | studentId |
---|---|---|---|
1 | open | 06-21-2022 | 2 |
2 | closed | 3 | |
3 | closed | 1 |
Looking at this example we can draw the following conclusions:
-
The
academicRecord
table has a Primary Key (academicRecordId
) that uniquely identifies each record. -
The
academicRecord
table is related to the student table through the Foreign KeystudentId
. This key is linked to the Primary KeystudentId
of thestudents
table, establishing a relationship between both tables. -
A relationship 1 to 1 is being established, since a student can only have one academic record and an academic record can only belong to one student. The following section explains the types of relationships
The relationship between the
students
andacademicRecord
tables allows us, among other things, to get information from both tables from a single query. For example, we could display in a single query the student’s name along with the status of their file, both fields belonging to independent but related tables.
As an example of referential integrity,let’s suppose that we want to delete directly from the
students
table the first record, referring to Noah Williams. This action would give us a Foreign Key violation error since that record is linked to theacademicRecord
table. If we wanted to delete the database record , we need to make sure that we have deleted it from all the tables in which is referenced. This is known as cascade delete and/or update,it being another of the key aspects of relational databases.
Candidate Key
A Candidate Key is a field or set of fields that, without being defined as a Primary Key, meet the specific conditions to identify each record in a unique way.
The
driversLicense
field of thestudents
table would be a good example of Candidate Key, since it is unrepeatable, it cannot be null and it only allows identifying a single record.
Alternate Key
An Alternate Key is none other than Candidate Key or a secondary candidate key that that meets the conditions of a Primary Key,so it is capable of identifying a row uniquely. However, such a key is not used as a Primary Key, because only one key is selected as the primary key. Thus, the other remaining keys are known as Alternate Keys.
In order to better understand the difference between a Candidate Key and an Alternate Key let’s consider the following scenario. When a table is defined, all fields that can be defined as Primary Key are essentially Candidate Key. Once a Primary Key has been set by the database administrator, the rest of the Candidate Key that have not been used become Alternate Key as secondary keys.
Therefore, we can deduce that in essence Candidate Key and Alternate Key are the same thing, only differing in their nomenclature when the Primary Key of the table is defined.
Composite Key
A Composite Key is nothing more than a key than in order to uniquely identify a record and be able to be used as a Primary Key requires the use of two or more fields, which is why it is also known as concatenated key. The fields of said compound key by themselves are not enough to uniquely identify a record, so it will be dependent on several fields.
To understand this concept, let us suppose the case of a table in which we would like to identify a record by username, instead of having a sequential identifier like Primary Key
name | lastName | dateOfBirth | nationality |
---|---|---|---|
James | Moore | 01-11-1989 | American |
James | Moore | 24-07-2001 | American |
Yixing | Zhang | 24-07-2001 | Chinese |
Let’s say we wanted to identify the first record in the table. Neither field by itself meets the conditions to uniquely identify it as a Primary Key. However, we could create a Composite Key that can be used as a Primary Key by concatenating the fields name
, lastName
and dateOfBirth
, being the three data necessary to be able to uniquely identify each record.
Even in the previous example, it exists the remote possibility that there was a user who coincided with another person already registered both in name
, lastName
and dateOfBirth
. If that were the case,that person’s date of birth could not be inserted in the database for violating the rule of uniqueness that every Primary Key must have. This highlights the importance of using fields with unrepeatable values, such as some type of unique identifier or a sequential code.
Super Key
A Super Key is a set of one or more attributes that can uniquely identify a row in a table. We could say that the Super Key are all the possible attribute combinations that allow us to identify a record. These subsets or Super Key allow the DB administrator to select the candidate keys by filtering those that lack redundant attributes for the identification of a record.
We will try to use Primary Key as simple and atomic as possible, since it will make it easier for us to maintain and define relationships in our database.
Relational Model
The relational model, as its name suggests, is a data model based on the mathematical concept of the relationship between the different tables that make up a database. This model is based on predicate logic and set theory and continues to be the most widely used model for shaping and dynamically managing data .
As we have seen before, it is based on the definition of tables with their respective rows, columns, identifiers and the relationship of the tables with each other through a Foreign Key.
Advantages of the relational model
Using the relational model to manage databases mainly offers the following advantages over other database models:
- It makes easy the management and administration of large amounts of data in a secure and uniform way.
- It has tools and processes that avoid duplication of data.
- It guarantees the referential integrity, ensuring cohesion and coherence of data. To delete or update a record, the system ensures that the change in the database is applied in cascade to all the records dependent on the related tables.
- It allows the normalization of the databases as a process that facilitates the application of a series of rules in the relationships with the aim of minimizing the redundancy of data and facilitating their management.
ACID Model
When we talk about databases it is very common to hear the acronym ACID, wich refers to the four key properties of a transaction (Atomicity, Consistency, Isolation and Durability) . The ACID model is a set of properties applicable to databases with the aim of guaranteeing the security of transactions.
A database transaction is a sequence of multiple operations performed on a database, and all served as a single logical unit of work, taking place wholly or not at all. In other words, if any operation fails during the transaction, the entire process is invalidated, ensuring the integrity of the database.
Having clarified this point, let’s define the four basic properties of the ACID model:
-
Atomicity: in terms of a transaction means all or nothing. This property ensures that the transaction behaves as an atomic unit. As we have mentioned before, this property guarantees that for a transaction to complete successfully, all the parts of which it is composed must have worked previously. If any part is wrong, the whole thing fails.
-
Consistency: this property is primarily responsible for ensuring the referential integrity of our database. In other words, the system guarantees the execution of valid transactions that continue to maintain the data integrity according to the applied rules.
-
Isolation: with multiple concurrent transactions running at the same time, each transaction should be kept independent without affecting other transactions executing simultaneously, in order to prevent any error.
-
Durability: this means that after a transacción successfully completes, changes to data persist and are not undone. Ensuring correct data persistence, even in the event of a system failure.