Launch Week 12: Day 3

Learn more

SQL or NoSQL? Why not use both (with PostgreSQL)?

2022-11-24

18 minute read

It's a tough decision for any developer starting a new project. Should you store your data in a standard, time-tested SQL database, or go with one of the newer NoSQL document-based databases? This seemingly simple decision can literally make or break your project down the line. Choose correctly and structure your data well, and you may sail smoothly into production and watch your app take off. Make the wrong choice and you could be headed for nightmares (and maybe even some major re-writes) before your app ever makes it out the door.

Simplicity vs Power

There are tradeoffs with both SQL and NoSQL solutions. Typically it's easier to get started with NoSQL data structures, especially when the data is complex or hierarchical. You can just take a JSON data object from your front-end code and throw it in the database and be done with it. But later when you need to access that data to answer some basic business questions, it's much more difficult. A SQL solution makes it easier to gather data and draw conclusions down the line. Let's look at an example:

Each day I track the food I eat, along with the number of calories in each item:

DayFood ItemCaloriesMeal
01 JanApple72Breakfast
01 JanOatmeal146Breakfast
01 JanSandwich445Lunch
01 JanChips280Lunch
01 JanCookie108Lunch
01 JanMixed Nuts175Snack
01 JanPasta/Sauce380Dinner
01 JanGarlic Bread200Dinner
01 JanBroccoli32Dinner

I also track the number of cups of water I drink and when I drink them:

DayTimeCups
Jan 0108:151
Jan 0109:311
Jan 0110:422
Jan 0112:072
Jan 0114:581
Jan 0117:151
Jan 0118:401
Jan 0119:051

And finally, I track my exercise:

DayTimeDurationExercise
Jan 0111:020.5Walking
Jan 0209:440.75Bicycling
Jan 0217:000.25Walking

For each day I also track my current weight along with any notes for the day:

DayWeightNotes
Jan 01172.6This new diet is awesome!
Jan 14170.2Not sure all this is worth it.
Jan 22169.8Jogged past a McDonald's today. It was hard.
Feb 01168.0I feel better, but sure miss all that greasy food.

Gathering All That Data

That's a lot of different data that needs to be gathered, stored, retrieved, and later analyzed. It's organized simply and easily, but the number of records varies from day to day. On any given day I may have zero or more entries for food, water, and exercise, and I may have zero or one entry for weight & notes.

In my app, I gather all the data for a single day on one page, to make it easier for my users. So, I get a JSON object for each day that looks like this:


_28
{
_28
"date": "2022-01-01",
_28
"weight": 172.6,
_28
"notes": "This new diet is awesome!",
_28
"food": [
_28
{ "title": "Apple", "calories": 72, "meal": "Breakfast" },
_28
{ "title": "Oatmeal", "calories": 146, "meal": "Breakfast" },
_28
{ "title": "Sandwich", "calories": 445, "meal": "Lunch" },
_28
{ "title": "Chips", "calories": 280, "meal": "Lunch" },
_28
{ "title": "Cookie", "calories": 108, "meal": "Lunch" },
_28
{ "title": "Mixed Nuts", "calories": 175, "meal": "Snack" },
_28
{ "title": "Pasta/Sauce", "calories": 380, "meal": "Dinner" },
_28
{ "title": "Garlic Bread", "calories": 200, "meal": "Dinner" },
_28
{ "title": "Broccoli", "calories": 32, "meal": "Dinner" }
_28
],
_28
"water": [
_28
{ "time": "08:15", "qty": 1 },
_28
{ "time": "09:31", "qty": 1 },
_28
{ "time": "10:42", "qty": 2 },
_28
{ "time": "10:42", "qty": 2 },
_28
{ "time": "12:07", "qty": 1 },
_28
{ "time": "14:58", "qty": 1 },
_28
{ "time": "17:15", "qty": 1 },
_28
{ "time": "18:40", "qty": 1 },
_28
{ "time": "19:05", "qty": 1 }
_28
],
_28
"exercise": [{ "time": "11:02", "duration": 0.5, "type": "Walking" }]
_28
}

Saving the Data

Once we've gathered all the data for a day, we need to store it in our database. In a NoSQL database, this can be a pretty easy process, as we can just create a record (document) for a specific user for a specific date and throw document into a collection and we're done. With SQL, we have some structure we have to work within, and in this case it looks like 4 separate tables: food, water, exercise, and notes. We'd want to do 4 separate inserts here, one for each table. If we don't have data for a specific table (say no exercise was recorded today) then we can skip that table.

If you're using SQL to store this data, you might want to save each table's data as it's entered in your data entry form (and not wait until all the data is entered.) Or you might want to create a database function that takes all the JSON data, parses it, and writes it to all the related tables in a single transaction. There's a lot of ways to handle this, but suffice it to say this: it's a bit more complicated than saving the data in a NoSQL database.

Retrieving the Data

If we want to display all the data for a single day, it's pretty much the same. With NoSQL you can grab the data for the user's day and then use it in your application. Nice! With SQL we need to query 4 tables to get all the data (or we could use a function to get it all in a single call.) Of course, when displaying the data, we'd need to first break up our JSON data into pieces that are needed by each section of our dashboard screen, and you could argue that it's simpler to map each SQL table with the dashboard section on the screen, but that's a trivial point.

Analyzing the Data

Now that we've saved the data and we can retrieve it and display it, let's use it for some analysis. Let's display a graph of how many total calories I've eaten over the past month. With SQL, this is a simple task:


_10
select
_10
date,
_10
sum(calories) as total_calories
_10
from food_log
_10
group by date
_10
where user_id = 'xyz' and day between '2022-01-01' and '2022-01-31'
_10
order by date;

Bam! Done! Now we can send those results to our graphing library and make a nice pretty picture of my eating habits.

But if we've stored this data in NoSQL, it gets a little more complicated. We'll need to:

  • grab all the data for the user for the month
  • parse each day's data to get the food log information
  • loop through each day and total the calories
  • send the aggregate data to our graphing module

If this is something we're going to do regularly, it makes sense to calculate the total calories for each day and store it in the day's document so we can get at that data faster. But that requires more work up front, and we still need to pull the data for each day and parse out that calorie total first. And if we update the data we still need to recalculate things and update that total. Eventually we'll want to do that with the water and exercise totals as well. The code will eventually start to get longer and more complex.

SQL and NoSQL Together - FTW

Let's see how we can use the power of SQL together with the ease-of-use of NoSQL in the same database to make this all a bit easier. We'll create a table for each day of data (for each user) and store the basic fields such as weight and notes first. Then we'll just throw the food_log, water_log, and exercise_log fields in a JSONB field.


_13
CREATE TABLE calendar (
_13
id uuid DEFAULT gen_random_uuid() NOT NULL,
_13
date date,
_13
user_id uuid NOT NULL,
_13
weight numeric,
_13
notes text,
_13
food_log jsonb,
_13
water_log jsonb,
_13
exercise_log jsonb
_13
);
_13
-- (Optional) - create a foreign key relationship for the user_id field
_13
ALTER TABLE ONLY calendar
_13
ADD CONSTRAINT calendar_user_id_fkey FOREIGN KEY (user_id) REFERENCES auth.users(id);

Now let's insert some data into the table. PostgreSQL offers both JSON and JSONB fields, and since the latter are more optimized by the database and much faster for query processing, we’ll almost always want to use JSONB. We’ll use JSONB fields for food_logwater_log, and exercise_log and just dump the data we got from our app right into those fields as a string:


_34
insert into calendar
_34
(date, user_id, weight, notes, food_log, water_log, exercise_log)
_34
values
_34
(
_34
'2022-01-01',
_34
'xyz',
_34
172.6,
_34
'This new diet is awesome!',
_34
'[
_34
{ "title": "Apple", "calories": 72, "meal": "Breakfast"},
_34
{ "title": "Oatmeal", "calories": 146, "meal": "Breakfast"},
_34
{ "title": "Sandwich", "calories": 445, "meal": "Lunch"},
_34
{ "title": "Chips", "calories": 280, "meal": "Lunch"},
_34
{ "title": "Cookie", "calories": 108, "meal": "Lunch"},
_34
{ "title": "Mixed Nuts", "calories": 175, "meal": "Snack"},
_34
{ "title": "Pasta/Sauce", "calories": 380, "meal": "Dinner"},
_34
{ "title": "Garlic Bread", "calories": 200, "meal": "Dinner"},
_34
{ "title": "Broccoli", "calories": 32, "meal": "Dinner"}
_34
]',
_34
'[
_34
{"time": "08:15", "qty": 1},
_34
{"time": "09:31", "qty": 1},
_34
{"time": "10:42", "qty": 2},
_34
{"time": "10:42", "qty": 2},
_34
{"time": "12:07", "qty": 1},
_34
{"time": "14:58", "qty": 1},
_34
{"time": "17:15", "qty": 1},
_34
{"time": "18:40", "qty": 1},
_34
{"time": "19:05", "qty": 1}
_34
]',
_34
'[
_34
{"time": "11:02", "duration": 0.5, "type": "Walking"}
_34
]'
_34
);

While that's a big insert statement, it sure beats doing inserts on 4 separate tables. With all those food entries and water log entries, we would have had to made 1 entry in the main table, then 9 food_log entries, 9 water_log entries, and one exercise_log entry for a total of 20 database records. We've wrapped that into a single record.

But How Do We Query This Data?

Great, we're collecting the data now, and it's easy to insert the data into the database. Editing the data isn't too bad either because we're just downloading the data to the client, updating the JSON field(s) as needed, and throwing them back into the database. Not too hard. But how can I query this data? What about that task from before? Let's display a graph of how many total calories I've eaten over the past month.

In this case, that data is stored inside the food_log field inside the calendar table. If only PostgreSQL had a way of converting JSONB arrays into individual database records (recordsets). Well, it does! The jsonb_array_elements function will do this for us, allowing to create a simple table we can use to calculate our caloric intake.

Here's some SQL to turn that food_log array into individual output records:


_10
select
_10
user_id,
_10
date,
_10
jsonb_array_elements(food_log)->>'title' as title,
_10
jsonb_array_elements(food_log)->'calories' as calories,
_10
jsonb_array_elements(food_log)->'meal' as meal
_10
from calendar
_10
where user_id = 'xyz'
_10
and date between '2022-01-01' and '2022-01-31';

This returns a table that looks like this:

datetitlecaloriesmeal
2022-01-01Apple72Breakfast
2022-01-01Oatmeal146Breakfast
2022-01-01Sandwich445Lunch
2022-01-01Chips280Lunch
2022-01-01Cookie108Lunch
2022-01-01Mixed Nuts175Snack
2022-01-01Pasta/Sauce380Dinner
2022-01-01Garlic Bread200Dinner
2022-01-01Broccoli32Dinner

A couple things to note:

  • jsonb_array_elements(food_log)->>'title' as title this returns a text field, since the ->> operator returns TEXT
  • jsonb_array_elements(food_log)->'calories' as calories this returns a JSON object, since the -> operator return JSON

If we want to sum the calories to get some totals, we can't have a JSON object, so we need to cast that to something more useful, like an INTEGER:

  • (jsonb_array_elements(food_log)->'calories')::INTEGER as calories this returns an INTEGER

Now we can't just throw the sum operator on this to get the total calories by day. If we try this:


_10
select
_10
date,
_10
sum((jsonb_array_elements(food_log)->'calories')::integer) as total_calories
_10
from calendar where user_id = 'xyz'
_10
and date between '2022-01-01' and '2022-01-31'
_10
group by date;

we get an error back from PostgreSQL: Failed to run sql query: aggregate function calls cannot contain set-returning function calls.

Instead, we need to think of this as a set of building blocks, where our first SQL statement returns a table:


_10
select
_10
date,
_10
(jsonb_array_elements(food_log)->'calories')::integer as calories
_10
from calendar where user_id = 'xyz'
_10
and date between '2022-01-01' and '2022-01-31';

Now we can take that "table" statement, throw some (parenthesis) around it, and query it:


_12
with data as
_12
(
_12
select
_12
date,
_12
(jsonb_array_elements(food_log)->'calories')::integer as calories
_12
from calendar
_12
where user_id = 'xyz'
_12
and date between '2022-01-01' and '2022-01-31'
_12
)
_12
select date, sum(calories)
_12
from data
_12
group by date;

This gives us exactly what we want:

datesum
2022-01-011838

If we add more data for the rest of the days of the month, we'll have all the data we need for a beautiful graph.

Searching the Data

What if we want to answer the question: How many calories were in the garlic bread I ate last month? This data is stored inside the food_log field in the calendar table. We can use the same type of query we used before to "flatten" the food_log data so we can search it.

To get every item I ate during the month of January, we can use:


_10
select
_10
date,
_10
jsonb_array_elements(food_log)->>'title' as title,
_10
(jsonb_array_elements(food_log)->'calories')::integer as calories
_10
from calendar
_10
where user_id = 'xyz'
_10
and date between '2022-01-01' and '2022-01-31'

Now to search for the garlic bread we can just put (parenthesis) around this to make a "table" and then search for the item we want:


_15
with my_food as
_15
(
_15
select
_15
date,
_15
jsonb_array_elements(food_log)->>'title' as title,
_15
(jsonb_array_elements(food_log)->'calories')::integer as calories
_15
from calendar
_15
where user_id = 'xyz'
_15
and date between '2022-01-01' and '2022-01-31'
_15
)
_15
select
_15
title,
_15
calories
_15
from my_food
_15
where title = 'Garlic Bread';

which gives us:

titlecalories
Garlic Bread200

Conclusion

If we take a little time to study the JSON Functions and Operators that PostgreSQL offers, we can turn Postgres into an easy-to-use NoSQL database that still retains all the power of SQL. This gives us a super easy way to store our complex JSON data coming from our application code in our database. Then we can use powerful SQL capabilities to analyze and present that data in our application. It's the best of both worlds!

More Postgres resources

Share this article

Build in a weekend, scale to millions