Introduction To PostgreSQL As A Time Series Database For IoT
Nguồn: https://www.c-sharpcorner.com/article/introduction-to-postgresql-as-a-time-series-database-for-iot/
A time-series database is a database system that is designed to store and load data over time through associated pairs of times and values.
Time Series databases were initially founded for the Stock market but later used in many fields like health monitoring, performance monitoring, and many statistics. In this article, I will talk about the main features in the Time Series databases with examples by using Postgres.
Choosing a good time-series database is an important issue when developing IoT applications. So, you must have a good understanding of the basics of time series databases.
For this article, you must install PostgreSQL and pgAdmin 4. You can download it from,
Related article,
Time Series Database
The idea, in a nutshell, is very simple. You have a series of events that occurred over time.
The data is collected sequentially over time. As shown above, the data is generated at regular intervals. The above-shown data is an elementary example, but in the real world, you have many values that should be stored very quickly in the database over time.
A stock market crash is a good use case for using a time series database:
\
Source: https://en.wikipedia.org/wiki/Stock_market_crash
This kind of database is beginning to be important again because of the IoT. If you remember my last article about the IoT, I talked about the IoT in a nutshell. The devices generate a lot of data/events over time, and that data should be sequentially stored. If you look at the time series database definition above, you will find that is what the time series databases are doing. That is why time series databases are perfect for many IoT applications.
As shown in Figure-3 the data must be written to disk in the order it arrives. If you have continuous data streaming, then it is very important to divide the data into chunks or partitions. In other words, we must divide the data to avoid one large table.
Partitioning
Partitioning by Time Range
Separate time series data into several times range partitions tables that support continuous and non-overlapping time range partitions.
Example
By date or time. For example, each month is the partition key, which is physically stored on the disk.
Creating the database
First, I will create a big table without partitions, and after that, we will divide this big table into small partitions. For example, we can store each day or hour in a partition.
I have opened the Query Tool from the Tools menu. Then I have copied and pasted the below text and pressed F5.
/* Create the IoT Series Schema */
CREATE SCHEMA iot_time_series
AUTHORIZATION postgres;
/* Create a table for the raspberry temperature measurement */
CREATE TABLE iot_time_series.temperatures
(
time timestamp without time zone NOT NULL,
deviceId VARCHAR ( 50 ) NOT NULL,
location VARCHAR ( 50 ),
value_in_celcius REAL
);
ALTER TABLE iot_time_series.temperatures OWNER to postgres;
You can see the created schema with the table after executing the code.
Now let us store some temperatures in the database. You can use the attached file (Temperatures.cvs) or use the Microsoft Raspberry IoT simulator.
/* Copy the data from the attached c:/temp/Temperatures.cvs */
COPY iot_time_series.temperatures(time, deviceid, location , value_in_celcius)
FROM 'c:/temp/Temperatures.cvs' DELIMITER ',';
Indexing
Creating indices for your search/order/group columns is important. The most important one is the time index. If you want to group or search for a specified device or location, then you can add an index for each desired column or add compound indexes for that.
Indices are very important to achieve better performance. So please try to understand the basic indexing concepts and the B-trees.
An example for index creation:
CREATE INDEX idx_time ON iot_time_series.temperatures(time);
/* Or ( Or both) */
CREATE INDEX idx_time_location ON iot_time_series.temperatures(time,location);
Partitions
Now we are in the most important part. I will divide the big table into small parts. In addition, it is important to create the indexes for each small partition. In other words, we will split the temperatures table into three partitions.
Every day is separated in a partition. So I can do it as follows: Partition-1 from day 1 until day 10; and Partition-2 from 11 to 20 and from 21 to 31. Again, To achieve better performance, we have to create the indexes again, but in this case, we have to create the indexes for each partition. This technique helps us to make the divided database more scalable.
\
Figure 4 - example for partitioning, source: https://www.postgresql.fastware.com/postgresql-insider-prt-ove
/* Create the table for the temperatures to be partitioned */
CREATE TABLE iot_time_series.temperatures_partitoned
(
time timestamp NOT NULL,
partition_by_day INTEGER,
deviceId VARCHAR ( 50 ) NOT NULL,
location VARCHAR ( 50 ),
value_in_celcius REAL
)
PARTITION BY RANGE (partition_by_day);
/* partition for days 1.. to .. 10 */
CREATE TABLE iot_time_series.temperatures_partition_1 PARTITION OF iot_time_series.temperatures_partitoned
FOR VALUES FROM (1) TO (10);
CREATE INDEX idx_temperatures_partition_1 ON iot_time_series.temperatures_partition_1(partition_by_day);
/* partition for days 11.. to .. 20 */
CREATE TABLE iot_time_series.temperatures_partition_2 PARTITION OF iot_time_series.temperatures_partitoned
FOR VALUES FROM (11) TO (20);
CREATE INDEX idx_temperatures_partition_2 ON iot_time_series.temperatures_partition_2(partition_by_day);
/* partition for days 21.. to .. 31 */
CREATE TABLE iot_time_series.temperatures_partition_3 PARTITION OF iot_time_series.temperatures_partitoned
FOR VALUES FROM (21) TO (31);
CREATE INDEX idx_temperatures_partition_3 ON iot_time_series.temperatures_partition_3(partition_by_day);
/* Copy the simulated data from old table to the new partitioned table. */
INSERT INTO iot_time_series.temperatures_partitoned
( time, partition_by_day, deviceId, location, value_in_celcius)
(SELECT time, extract(day from time), deviceId, location, value_in_celcius
FROM iot_time_series.temperatures);
Query the data
Find the max device temperature in the first three days.
SELECT location, max(value_in_celcius)
FROM iot_time_series.temperatures_partitoned
WHERE time BETWEEN '2022-01-01' AND '2022-01-04'
GROUP BY location;
.....
"Paços de Ferreira" 32
"Daxi" 32
"Kawanoechō" 29
"Akhfennir" 6
"Timba Timuk" 7
"Dobrich" 37
"Zamarski" 26
"Valladolid" 15
"Mercedes" 39
"Karlsruhe" 40
"Paratinga" 30
"Almaguer" 26
"As Salţ" 8
.....
As you see, the max temperature was 40 in Karlsruhe.
Predication
First, you have to take a look at the linear regression in the given link below. I have taken the same data from the provided below example,
https://www.mathsisfun.com/data/least-squares-regression.html
Step 1
Create the predication table, and I have inserted the same data from the above link (math fun).
/* Create a table for the raspberry temperature predication */
CREATE TABLE iot_time_series.temperatures_predication
(
time timestamp without time zone NOT NULL,
day integer,
deviceId VARCHAR ( 50 ) NOT NULL,
location VARCHAR ( 50 ),
value_in_celcius REAL
);
ALTER TABLE iot_time_series.temperatures_predication OWNER to postgres;
INSERT INTO iot_time_series.temperatures_predication (time, day, deviceId, location, value_in_celcius)
VALUES
('2022-01-02 01:00:00', 2, 'device1', 'Stuttgart', 4),
('2022-01-03 01:00:00', 3,'device1', 'Stuttgart', 5),
('2022-01-05 01:00:00', 5, 'device1', 'Stuttgart', 7),
('2022-01-07 01:00:00', 7, 'device1', 'Stuttgart', 10),
('2022-01-09 01:00:00', 9, 'device1', 'Stuttgart', 15);
If you remember, our aim was to calculate the values m (slope) and b (y-intercept) in the equation of a line,
y = mx + b
Where,
- y = how far up
- x = how far along
- m = Slope or Gradient (how steep the line is)
- b = the Y Intercept (where the line crosses the Y-axis)
I will use the Postgres linear function to create the prediction model.
SELECT
regr_slope(value_in_celcius, day) slope,
regr_intercept(value_in_celcius, day) intercept
FROM
iot_time_series.temperatures_predication
WHERE
time BETWEEN '2022-01-01' AND '2022-01-10';
Based on the given values, we calculate the slope and the intercept, as you see above.
- X = Day
- Y = Value in Celsius
Now we have created the linear prediction formula. We can use this formula to predict future temperatures on any day.
Applying the prediction model
Which temperature is at 2022-01-08? (Day 8)
So, I will modify the formula as below,
SELECT
regr_slope(value_in_celcius, day ) * 8 +
regr_intercept(value_in_celcius, day) predicatedTemperatureInDay
FROM
iot_time_series.temperatures_predication
WHERE
time BETWEEN '2022-01-01' AND '2022-01-10';
When you excute the code above then, you will receive 12.45 so that means the predictd temperature at 2022.01.08 is 12.45.
View from the prediction model
Summary
Data is collected sequentially over time in a time series or a time-stamped database. This kind of database has been used for a long time, and it is a reliable technology that is applied in many fields. In addition, the time series databases are perfect for many IoT seniors. Finally, the article explains how Postgres can be used as a time series database and how Postgres can generate and retrieve data and make a straightforward prediction model.