Flat Tables Databases

Chester Supelana

January 26, 2021

Flat Table Implementation

When we go to Youtube, we would use the search bar to find the videos we want to watch. Afterwards, a list of videos with their different information is displayed on the screen.

<br/>

<div> <img src="https://strapi.saperium.com/uploads/flat_table_1_4336c2cc9f.png" class="img-fluid" </div>

<br/> <br/>

In this case, each video row contains the title, description, number of views, and among other things, as well as the name of the user who uploaded the video and whether the uploader is verified or not (specified by the small check mark beside the name). If we were to design a normal relational database for this, we can use 2 tables to create this list --- Users and Videos.

<br/>

<div class="row">

<div class="col-12 col-md-5"> <div class="table-responsive">

<table class="table"> <thead> <tr> <th align="left"> Users </th> </tr> </thead> <tbody> <tr> <td align="left"> <strong>id</strong>: string </td> </tr> <tr> <td align="left"> <strong>name</strong>: string </td> </tr> <tr> <td align="left"> <strong>email</strong>: string </td> </tr> <tr> <td align="left"> <strong>password</strong>: string </td> </tr> <tr> <td align="left"> <strong>is_verified</strong>: boolean </td> </tr> </tbody> </table> </div>

</div>

<div class="col-12 offset-md-2 col-md-5"> <div class="table-responsive">

<table class="table"> <thead> <tr> <th align="left"> Videos </th> </tr> </thead> <tbody> <tr> <td align="left"> <strong>id</strong>: string </td> </tr> <tr> <td align="left"> <strong>title</strong>: string </td> </tr> <tr> <td align="left"> <strong>description</strong>: string </td> </tr> <tr> <td align="left"> <strong>user_id</strong>: string </td> </tr> <tr> <td align="left"> <strong>views</strong>: integer </td> </tr> <tr> <td align="left"> <strong>thumbnail_url</strong>: string </td> </tr> </tbody> </table> </div> </div>

</div>

<br/>

To create these tables in mySQL, we can use these statements:

CREATE TABLE `Users`(
    id VARCHAR(25) NOT NULL,
    name VARCHAR(25) NOT NULL,
    email VARCHAR(25) NOT NULL,
    password VARCHAR(25) NOT NULL,
    is_verified TINYINT(1) NOT NULL,
    PRIMARY KEY (id)
);

CREATE TABLE `Videos`(
    id VARCHAR(25) NOT NULL,
    title VARCHAR(50) NOT NULL,
    description VARCHAR(500) NOT NULL,
    user_id VARCHAR(25),
    views INT NOT NULL,
    thumbnail_url VARCHAR(200),
    PRIMARY KEY (id),
    FOREIGN KEY (user_id) REFERENCES `Users` (id)
);

<br/>

Notice that we added a foreign key to reference the user_id in the Videos table with the id in the Users table. This is to add a relationship between the video stored in Videos with the user who uploaded it that is stored in Users. In other words, it allows us to connect a column in one table to a column in another table.

Then to make a query similar to our previous example (in a real application, our query would probably be more complex), we can use this SELECT statement:

SELECT
  v.id as video_id,
  u.id as user_id,
  u.name,
  u.is_verified,
  v.title,
  v.description,
  v.views,
  v.thumbnail_url
FROM Videos v
INNER JOIN Users u
  ON u.id = v.user_id
WHERE
  v.title LIKE '%how to make rice%';

<br/>

As we can see with this query, we needed to join the two tables in order to materialize the list of videos. We can optimize this by changing how we implemented our database. Instead of joining the tables to get information about the uploader, we can simply just store the user info on the Videos table!

<br/>

Alternative database implementation

As stated before, we can add an optimization to our query by adding additional columns in our tables to avoid needing to use joins. This implementation is called Flat Table and the essence of this implementation is rather than building your database schema based on your entities (ex. videos or users) and its relationships, your schema will now be dictated by the view or the UI. In order to do that, we can simply just store everything that we can see on the listing item in a single row.

Before implementing a flat table database, let us first understand how we can benefit from using it.

<br/>

Easy to set up

The most notable feature of the flat table database is its simplicity. Since all we are doing is storing everything we need in a single table, it will be easier to build a database schema. This also allows us to create new features/tables faster and easier.

Simpler queries

Another advantage of using a flat table database is that we can write faster queries much more easily. Since we are querying only 1 table, the amount of reads executed are reduced. If we are able to complement this by using a good index on that table, then we can drastically reduce the amount of reads that the DBMS would otherwise do.

<br/>

While these advantages might seem enticing, it is worth knowing the disadvantages that come along with it.

<br/>

Data redundancy/duplication

In a flat table database, the same piece of information can be repeated across multiple rows. Using our Youtube example from above, our Videos table can look like this:

<div class="table-responsive"> <table class="table"> <thead> <tr> <th align="left"> id </th> <th align="left"> title </th> <th align="left"> user_id </th> <th align="left"> views </th> <th align="left"> user_name </th> <th align="left"> ... </th> </tr> </thead> <tbody> <tr> <td align="left"> 4554 </td> <td align="left"> Minecraft vid #1 </td> <td align="left"> 1001 </td> <td align="left"> 255432 </td> <td align="left"> <strong>Piediepew</strong> </td> <td align="left"> ... </td> </tr> <tr> <td align="left"> 4555 </td> <td align="left"> Minecraft vid #2 </td> <td align="left"> 1001 </td> <td align="left"> 200321 </td> <td align="left"> <strong>Piediepew</strong> </td> <td align="left"> ... </td> </tr> <tr> <td align="left"> 4556 </td> <td align="left"> How cheese is made </td> <td align="left"> 1454 </td> <td align="left"> 53324
</td> <td align="left"> How3Basic </td> <td align="left"> ... </td> </tr> <tr> <td align="left"> 4557 </td> <td align="left"> Minecraft vid #3 </td> <td align="left"> 1001 </td> <td align="left"> 182772 </td> <td align="left"> <strong>Piediepew</strong> </td> <td align="left"> ... </td> </tr> <tr> <td align="left"> 4558 </td> <td align="left"> C++ Tutorial Ep. 1 </td> <td align="left"> 1954 </td> <td align="left"> 120012 </td> <td align="left"> ProgrammersHub </td> <td align="left"> ... </td> </tr> </tbody> </table> </div>

<br/>

In this table, the entry “Piediepew” is repeated multiple times in the table. This can cause some problems. For example, if a user changes his/her user name, then all rows in the Videos table containing that user name will also need to be updated -- as opposed to only updating a single row in the Users table.

No referential integrity

Another disadvantage is losing referential integrity. This means we will not have a way to create a relationship between different tables. For example, given these tables:

<div class="row">

<div class="col-12 col-md-4">

Users:

<div class="table-responsive"> <table class="table"> <thead> <tr> <th align="left"> id </th> <th align="left"> name </th> <th align="left"> ... </th> </tr> </thead> <tbody> <tr> <td align="left"> 1001 </td> <td align="left"> Piediepew </td> <td align="left"> ... </td> </tr> <tr> <td align="left"> 1002 </td> <td align="left"> HigaHiga </td> <td align="left"> ... </td> </tr> </tbody> </table> </div> </div>

<div class="col-12 offset-md-2 col-md-6">

Videos:

<div class="table-responsive"> <table class="table"> <thead> <tr> <th align="left"> id </th> <th align="left"> user_id </th> <th align="left"> user_name </th> <th align="left"> ... </th> </tr> </thead> <tbody> <tr> <td align="left"> 4554 </td> <td align="left"> 1001 </td> <td align="left"> Piediepew
</td> <td align="left"> ... </td> </tr> <tr> <td align="left"> 4555 </td> <td align="left"> 1001 </td> <td align="left"> Piediepew
</td> <td align="left"> ... </td> </tr> </tbody> </table> </div> </div>

</div>

Without referential integrity enforced, the database will see the id in Users as completely unrelated and distinct from the user_id in Videos. This can cause inconsistencies within your database. For example, if user Piediepew’s account is updated (or deleted) in the Users table, his/her videos will not automatically change in any way because the database does not see that they are actually related. So even though Piediepew’s user account does not exist, there can be videos where the uploader field will still show Piediepew’s account! With this, the developer is then responsible to maintain the integrity of the data.

Using our example once again, to make sure the user names in the Videos table are correct, we can create a stored procedure to update the user_name in Videos every time the name in Users is updated:

CREATE PROCEDURE update_videos_name (
    IN p_user_id VARCHAR(25),
    IN p_user_name VARCHAR(25)
)
BEGIN
    UPDATE `Videos`
        SET user_name = p_user_name
    WHERE user_id = p_user_id;
END;

<br/>

To use this, simply call it with the correct parameters after updating the user name:

-- Change user name of id 1001 to ‘Pewdiepie’
UPDATE `Users`
    SET name = 'Pewdiepie'
WHERE id = '1001';

-- Also change it in Videos table
CALL update_videos_name('1001', 'Pewdiepie');

Database is dependent on the UI

Another thing to consider when using a flat table database is that the UI is now tied to your database. This means when the UI changes, the tables may also change. Also, each listing in your application would require its own table which could further increase data redundancy.

At this point, it is worth noting that using a flat table database causes reads to be much faster while making writes slower. If it is important for users of the application to be able to view data consistently fast (e.g. in an auction application where time is crucial), we can opt to sacrifice some write performance (assuming they are implemented correctly) for an even faster read performance. In this case, a flat table database might be more suited than using an implementation with slower reads but faster writes.

<br/>

Using Flat Table Implementation

To add this optimization on our video listing query, first we must change the Videos table. In our example, we want to display all of the information of each video, including the name of the uploader and the uploader’s verified status.

DROP TABLE IF EXISTS `Videos`;
CREATE TABLE `Videos`(
    id VARCHAR(25) NOT NULL,
    title VARCHAR(50) NOT NULL,
    description VARCHAR(500) NOT NULL,
    user_id VARCHAR(25) NOT NULL,
    views INT NOT NULL,
    thumbnail_url VARCHAR(200),
    user_name VARCHAR(25) NOT NULL,
    user_is_verified TINYINT(1) NOT NULL,
    PRIMARY KEY (id)
);

<br/>

In this new table, we added 2 new columns, user_name and user_is_verified. We also removed the foreign key declaration because we don’t need a link to the Users table anymore. This way, since our Videos table already contains all the information we need to display the listing, we can write our query without needing to join Videos with Users anymore.

SELECT
  id as video_id,
  user_id,
  user_name,
  user_is_verified,
  title,
  description,
  views,
  thumbnail_url
FROM Videos
WHERE
  title LIKE '%how to make rice%';