I have a table say "sample" which saves a new record each five minutes.
Users might ask for data collected for a specific sampling interval of say 10 min or 30 min or an hour.
Since I have a record every five minutes, when a user asks for data with a hour sample interval, I will have to club/group every 12 (60/5) records in to one record (already sorted based on the time-stamp), and the criteria could be either min/max/avg/last value.
I was trying to do this in Java once I fetch all the records, and am seeing pretty bad performance as I have to iterate through the collection multiple times, I have read of other alternatives like jAgg and lambdaj, but wanted to check if that's possible in SQL (MySQL) itself.
The sampling interval is dynamic and the aggregation function (min/max/avg/last) too is user provided.
Any pointers ?
You can do this in SQL, but you have to carefully construct the statement. Here is an example by hour for all four aggregations:
select min(datetime) as datetime,
min(val) as minval, max(val) as maxval, avg(val) as avgval,
substring_index(group_concat(val order by datetime desc), ',', 1) as lastval
from table t
group by floor(to_seconds(datetime) / (60*60));
Related
Some background first. We have a MySQL database with a "live currency" table. We use an API to pull the latest currency values for different currencies, every 5 seconds. The table currently has over 8 million rows.
Structure of the table is as follows:
id (INT 11 PK)
currency (VARCHAR 8)
value (DECIMAL
timestamp (TIMESTAMP)
Now we are trying to use this table to plot the data on a graph. We are going to have various different graphs, e.g: Live, Hourly, Daily, Weekly, Monthly.
I'm having a bit of trouble with the query. Using the Weekly graph as an example, I want to output data from the last 7 days, in 15 minute intervals. So here is how I have attempted it:
SELECT *
FROM currency_data
WHERE ((currency = 'GBP')) AND (timestamp > '2017-09-20 12:29:09')
GROUP BY UNIX_TIMESTAMP(timestamp) DIV (15 * 60)
ORDER BY id DESC
This outputs the data I want, but the query is extremely slow. I have a feeling the GROUP BY clause is the cause.
Also BTW I have switched off the sql mode 'ONLY_FULL_GROUP_BY' as it was forcing me to group by id as well, which was returning incorrect results.
Does anyone know of a better way of doing this query which will reduce the time taken to run the query?
You may want to create summary tables for each of the graphs you want to do.
If your data really is coming every 5 seconds, you can attempt something like:
SELECT *
FROM currency_data cd
WHERE currency = 'GBP' AND
timestamp > '2017-09-20 12:29:09' AND
UNIX_TIMESTAMP(timestamp) MOD (15 * 60) BETWEEN 0 AND 4
ORDER BY id DESC;
For both this query and your original query, you want an index on currency_data(currency, timestamp, id).
I'm using a MySQL database to store values from some energy measurement system. The problem is that the DB contains millions of rows, and the queries take somewhat long to complete. Are the queries optimal? What should I do to improve them?
The database table consists of rows with 15 columns each (t, UL1, UL2, UL3, PL1, PL2, PL3, P, Q1, Q2, Q3,CosPhi1, CosPhi2, CosPhi3, i), where t is time, P is total power and i is some identifier.
Seeing as I display the data in graphs grouped in different intervals (15 minutes, 1 hour, 1 day, 1 month) I want to group the querys as such.
As an example I have a graph that shows the kWh for every day in the current year. The query to gather the data goes like this:
SELECT t, SUM(P) as P
FROM table
WHERE i = 0 and t >= '2015-01-01 00:00:00'
GROUP BY DAY(t), MONTH(t)
ORDER BY t
The database has been gathering measurements for 13 days, and this query alone is already taking 2-3 seconds to complete. Those 13 days have added about 1-1.3 million rows to the db, as a new row gets added every second.
Is this query optimal?
I would actually create a secondary table that has a column for each DAY, and one for the total. Then, via a trigger, your insert into the detail table can update the secondary aggregate table. This way, you can sum the DAILY table which will be much quicker, and yet still have the per second table if you needed to look at the granular level details.
Having aggregate tables can be a common time-saver for querying, especially for read-only types of data, or data you know wont be changing. Then, if you want more granular detail such as hourly or 15 minute intervals, go directly to the raw data.
For this query:
SELECT t, SUM(P) as P
FROM table
WHERE i = 0 and t >= '2015-01-01 00:00:00'
GROUP BY DAY(t), MONTH(t)
ORDER BY t
The optimal index is a covering index: table(i, t, p).
2-3 seconds for 1+ million rows suggests that you already have an index.
You may want to consider DRapp's suggestion and use summary tables. In a few months, you will have so much data that historical queries could be taking a long time.
In the meantime, though, indexes and partitioning might provide sufficient performance for your needs.
I'm reasonably new to Access and having trouble solving what should be (I hope) a simple problem - think I may be looking at it through Excel goggles.
I have a table named importedData into which I (not so surprisingly) import a log file each day. This log file is from a simple data-logging application on some mining equipment, and essentially it saves a timestamp and status for the point at which the current activity changes to a new activity.
A sample of the data looks like this:
This information is then filtered using a query to define the range I want to see information for, say from 29/11/2013 06:00:00 AM until 29/11/2013 06:00:00 PM
Now the object of this is to take a status entry's timestamp and get the time difference between it and the record on the subsequent row of the query results. As the equipment works for a 12hr shift, I should then be able to build a picture of how much time the equipment spent doing each activity during that shift.
In the above example, the equipment was in status "START_SHIFT" for 00:01:00, in status "DELAY_WAIT_PIT" for 06:08:26 and so-on. I would then build a unique list of the status entries for the period selected, and sum the total time for each status to get my shift summary.
You can use a correlated subquery to fetch the next timestamp for each row.
SELECT
i.status,
i.timestamp,
(
SELECT Min([timestamp])
FROM importedData
WHERE [timestamp] > i.timestamp
) AS next_timestamp
FROM importedData AS i
WHERE i.timestamp BETWEEN #2013-11-29 06:00:00#
AND #2013-11-29 18:00:00#;
Then you can use that query as a subquery in another query where you compute the duration between timestamp and next_timestamp. And then use that entire new query as a subquery in a third where you GROUP BY status and compute the total duration for each status.
Here's my version which I tested in Access 2007 ...
SELECT
sub2.status,
Format(Sum(Nz(sub2.duration,0)), 'hh:nn:ss') AS SumOfduration
FROM
(
SELECT
sub1.status,
(sub1.next_timestamp - sub1.timestamp) AS duration
FROM
(
SELECT
i.status,
i.timestamp,
(
SELECT Min([timestamp])
FROM importedData
WHERE [timestamp] > i.timestamp
) AS next_timestamp
FROM importedData AS i
WHERE i.timestamp BETWEEN #2013-11-29 06:00:00#
AND #2013-11-29 18:00:00#
) AS sub1
) AS sub2
GROUP BY sub2.status;
If you run into trouble or need to modify it, break out the innermost subquery, sub1, and test that by itself. Then do the same for sub2. I suspect you will want to change the WHERE clause to use parameters instead of hard-coded times.
Note the query Format expression would not be appropriate if your durations exceed 24 hours. Here is an Immediate window session which illustrates the problem ...
' duration greater than one day:
? #2013-11-30 02:00# - #2013-11-29 01:00#
1.04166666667152
' this Format() makes the 25 hr. duration appear as 1 hr.:
? Format(#2013-11-30 02:00# - #2013-11-29 01:00#, "hh:nn:ss")
01:00:00
However, if you're dealing exclusively with data from 12 hr. shifts, this should not be a problem. Keep it in mind in case you ever need to analyze data which spans more than 24 hrs.
If subqueries are unfamiliar, see Allen Browne's page: Subquery basics. He discusses correlated subqueries in the section titled Get the value in another record.
I store top-views and 'likes' in a table called 'counts'. Once a night I run this query
UPDATE `counts` SET rank=d7+d6+d5+d4+d3+d2+d1,d7=d6,d6=d5,d5=d4,d4=d3,d3=d2,d2=d1,d1=0
Each day of the week has a d1-d7 variable, and we move it 'down' one each night and re-calculate the sum.
As my site has grown, this query now takes ~20 minutes.
I'm looking for suggestions on how to organize this more efficiently, as it seems like it might be a common pattern.
As the comments say, we need to see the schema. But I'll make a suggestion anyway. Don't have 7 different fields d1-d7. What if later you decide to keep the score over a year? Ouch.
I'm going to assume that counts has view_id as its PK. Then have another table ranks with columns view_id (set as FK into counts), rank (generalizes d1-d7, whatever datatype they are) and rank_date, which is a date. Now every night you have
UPDATE counts SET rank = (SELECT SUM(rank) FROM ranks r WHERE r.view_id=counts.view_id
AND r.rank_date>=DATE_SUB(CURDATE(), INTERVAL 1 WEEK) );
[Some RDBMSs allow a JOIN-type syntax in UPDATE queries. I believe MySQL understands something similar to the following, but it isn't my usual RDBMS
UPDATE counts, (SELECT view_id, SUM(rank) AS srank FROM ranks r
WHERE r.rank_date>=DATE_SUB(CURDATE(), INTERVAL 1 WEEK)
GROUP BY r.view_id) AS q1
SET rank = srank
WHERE counts.view_id=q1.view_id;
]
If so, that will probably run faster than the first version.
Meanwhile, optionally to clean up, you can delete rows from ranks that are more than 1 week old, but in a more flexible schema, you don't have to.
I have a table containing access logs. I want to know how many accesses to resource_id '123' occured in each hour in a 24 hour day.
My first thought for retrieving this info is just looping through each hour and querying the table in each loop with something like... and time like '$hour:%', given that the time field holds data in the format 15:47:55.
Is there a way I can group by the hours and retrieve each hour and the number of rows within each hour in a single query?
Database is MySQL, language is PHP.
SELECT HOUR(MyDatetimeColumn) AS h, COUNT(*)
FROM MyTable
GROUP BY h;
You can use the function HOUR to get the hour out of the time. Then you should be able to group by that.