I have a MySQL database that has several columns of data sampled each 5 minutes for two years. Showing that into grafana works for the few-days-range, but not over months or years.
I can manually query/extract the data to view moving averages by explicitly calling out columns:
truncate table environmental_history;
insert into environmental_history (logtime, panel_b, panel_c,cm_main_power,cm_aux1_power,cm_aux2_power)
SELECT logtime,
avg (panel_b) OVER(ORDER BY gmttime ROWS BETWEEN 720 PRECEDING AND CURRENT ROW ) as PanelB,
avg(panel_c) OVER(ORDER BY gmttime ROWS BETWEEN 720 PRECEDING AND CURRENT ROW) as PanelC,
avg(cm_main_power) OVER(ORDER BY gmttime ROWS BETWEEN 720 PRECEDING AND CURRENT ROW) as cm_main_power,
avg(cm_aux1_power) OVER(ORDER BY gmttime ROWS BETWEEN 720 PRECEDING AND CURRENT ROW) as cm_aux1_power,
avg(cm_aux2_power) OVER(ORDER BY gmttime ROWS BETWEEN 720 PRECEDING AND CURRENT ROW) as cm_aux2_power
from main. environmental ORDER BY gmttime DESC
I am using this table to feed into Grafana and would love a way to automate the creation of this history table averaging all the input table columns into an averages table.
I hope Im saying that clearly enough.
How should I be creating additional tables or indexes or whatnot to allow long term grafana views?
Related
I have two columns date and sales and the objective is to use case statement to create another column that shows the cumulative sum of sales for each date.
Here's the sales table
date
sales
2019-04-01
50
2019-04-02
100
2019-04-03
100
What would be the best way to write a case statement in order to meet the requirements below?
Desired output
date
sales
cumulative
2019-04-01
50
50
2019-04-02
100
150
2019-04-03
100
250
You don't need a CASE expression, but rather just use SUM() as a window function:
SELECT date, sales, SUM(sales) OVER (ORDER BY date) AS cumulative
FROM yourTable
ORDER BY date;
There's no need for a case statement here; you just need the SUM window function:
select date, sales, sum(sales) over (order by date, id)
from sales
(If date is unique, ordering by date is enough. If it is not, it is best practice to specify additional columns to order by to produce a non-arbitrary result, such as you would get if sometimes it considered rows with the same date in one order and sometimes in another.)
When you use the sum window function with no order by clause, the default window frame is RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, causing it to sum the expression for all rows being returned. When you specify an order by, the default frame becomes RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, summing the expression for all rows up through the current row in the given order, producing a cumulative total. Because this is the default, there is no need to specify it; if you do specify it, it goes after the ORDER BY in the window specification.
I have a table that looks like this:
id
slot
total
1
2022-12-01T12:00
100
2
2022-12-01T12:30
150
3
2022-12-01T13:00
200
There's an index on slot already. The table has ~100mil rows (and a bunch more columns not shown here)
I want to sum the total up to the current moment in time (EDIT: WASN'T CLEAR INITIALLY, I WILL PROVIDE A LOWER SLOT BOUND, SO THE SUM WILL BE OVER SOME NUMBER OF DAYS/WEEKS, NOT OVER FULL TABLE). Let's say the time is currently 2022-12-01T12:45. If I run select * from my_table where slot < CURRENT_TIMESTAMP(),
then I get back records 1 and 2.
However, in my data, the records represent forecasted sales within a time slot. I want to find the forecasts as of 2022-12-01T12:45, and so I want to find the proportion of the half hour slot of record 2 that has elapsed, and return that proportion of the total.
As of 2022-12-01T12:45 (assuming minute granularity), 50% of row 2 has elapsed, so I would expect the total to return as 150 / 2 = 75.
My current query works, but is slow. What are some ways I can optimise this, or other approaches I can take?
Also, how can we extend this solution to be generalised to any interval frequency? Maybe tomorrow we change our forecasting model and the data comes in sporadically. The hardcoded 30 would not work in that case.
select sum(fraction * total) as t from
select total,
LEAST(
timestampdiff(
minute,
datetime,
current_timestamp()
),
30
) / 30 as fraction
from my_table
where slot <= current_timestamp()
Consider computing your sum first, then remove the last element partial total. In order to keep the last element total, I'd prefer applying window functions instead of aggregations, and limit the output to the last row.
SET #current_time = CURRENT_TIMESTAMP();
WITH cte AS (
SELECT slot,
SUM(total) OVER(ORDER BY slot) AS total,
total AS rowtotal
FROM my_table
WHERE slot < #current_time
ORDER BY slot DESC
LIMIT 1
)
SELECT slot,
total - (30 - TIMESTAMPDIFF(MINUTE,
slot,
#current_time))
/30 * rowtotal AS total
FROM cte
Check the demo here.
Note1: Adding an index on the slot field is likely to boost this query performance.
Note2: If your query is running on millions of data, your timestamp may be likely to change during the query. You could store it into a variable before the query is run (or into another cte).
create an ondex in slot column btree as it is having high selectivity;
Some background first. We have a MySQL database with a "live currency" table. We use an API to pull the latest currency values for different currencies, every 5 seconds. The table currently has over 8 million rows.
Structure of the table is as follows:
id (INT 11 PK)
currency (VARCHAR 8)
value (DECIMAL
timestamp (TIMESTAMP)
Now we are trying to use this table to plot the data on a graph. We are going to have various different graphs, e.g: Live, Hourly, Daily, Weekly, Monthly.
I'm having a bit of trouble with the query. Using the Weekly graph as an example, I want to output data from the last 7 days, in 15 minute intervals. So here is how I have attempted it:
SELECT *
FROM currency_data
WHERE ((currency = 'GBP')) AND (timestamp > '2017-09-20 12:29:09')
GROUP BY UNIX_TIMESTAMP(timestamp) DIV (15 * 60)
ORDER BY id DESC
This outputs the data I want, but the query is extremely slow. I have a feeling the GROUP BY clause is the cause.
Also BTW I have switched off the sql mode 'ONLY_FULL_GROUP_BY' as it was forcing me to group by id as well, which was returning incorrect results.
Does anyone know of a better way of doing this query which will reduce the time taken to run the query?
You may want to create summary tables for each of the graphs you want to do.
If your data really is coming every 5 seconds, you can attempt something like:
SELECT *
FROM currency_data cd
WHERE currency = 'GBP' AND
timestamp > '2017-09-20 12:29:09' AND
UNIX_TIMESTAMP(timestamp) MOD (15 * 60) BETWEEN 0 AND 4
ORDER BY id DESC;
For both this query and your original query, you want an index on currency_data(currency, timestamp, id).
I'm using a MySQL database to store values from some energy measurement system. The problem is that the DB contains millions of rows, and the queries take somewhat long to complete. Are the queries optimal? What should I do to improve them?
The database table consists of rows with 15 columns each (t, UL1, UL2, UL3, PL1, PL2, PL3, P, Q1, Q2, Q3,CosPhi1, CosPhi2, CosPhi3, i), where t is time, P is total power and i is some identifier.
Seeing as I display the data in graphs grouped in different intervals (15 minutes, 1 hour, 1 day, 1 month) I want to group the querys as such.
As an example I have a graph that shows the kWh for every day in the current year. The query to gather the data goes like this:
SELECT t, SUM(P) as P
FROM table
WHERE i = 0 and t >= '2015-01-01 00:00:00'
GROUP BY DAY(t), MONTH(t)
ORDER BY t
The database has been gathering measurements for 13 days, and this query alone is already taking 2-3 seconds to complete. Those 13 days have added about 1-1.3 million rows to the db, as a new row gets added every second.
Is this query optimal?
I would actually create a secondary table that has a column for each DAY, and one for the total. Then, via a trigger, your insert into the detail table can update the secondary aggregate table. This way, you can sum the DAILY table which will be much quicker, and yet still have the per second table if you needed to look at the granular level details.
Having aggregate tables can be a common time-saver for querying, especially for read-only types of data, or data you know wont be changing. Then, if you want more granular detail such as hourly or 15 minute intervals, go directly to the raw data.
For this query:
SELECT t, SUM(P) as P
FROM table
WHERE i = 0 and t >= '2015-01-01 00:00:00'
GROUP BY DAY(t), MONTH(t)
ORDER BY t
The optimal index is a covering index: table(i, t, p).
2-3 seconds for 1+ million rows suggests that you already have an index.
You may want to consider DRapp's suggestion and use summary tables. In a few months, you will have so much data that historical queries could be taking a long time.
In the meantime, though, indexes and partitioning might provide sufficient performance for your needs.
Why MySQL search all rows when I switch to a 1 year range?
--Table dates
id (int)
date (timestamp)
value (varchar)
PRIMARY(id), date_index(date)
1750 rows
Executing
EXPLAIN SELECT * FROM dates WHERE date BETWEEN '2011-04-27' AND '2011-04-28'
The rows column display 18 rows.
If I increase or decrease the BETWEEN range - 1 year for example - the rows column display 1750 rows.
EXPLAIN SELECT * FROM dates WHERE date BETWEEN '2011-04-27' AND '2012-04-28'
EXPLAIN SELECT * FROM dates WHERE date BETWEEN '2010-04-27' AND '2011-04-28'
The optimizer builds the query plan depending on several things including the amount/distribution of the data. My best guess would be that you don't have much more than a year's data or that using the index for the year's worth of data wouldn't use many less rows than the total table size.
If that doesn't sound right can you post up the output of:
SELECT MIN(date), MAX(date) FROM dates;
SELECT COUNT(*) FROM dates WHERE date BETWEEN '2011-04-27' AND '2012-04-28';
This article I wrote shows some examples of how the optimizer works too: What makes a good MySQL index? Part 2: Cardinality