I am trying to optimize the sql query on a large event table (10 million+ rows) for date range search. I already have unique index on this table which (lid, did, measurement, date).The query below is trying to get the event of three type of measurement (Kilowatts, Current and voltage) for every 2 second interval in date column :
SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey
from events
WHERE lid = 1
and did = 1
and measurement IN ("Voltage")
group by timekey
UNION
SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey
from events
WHERE lid = 1
and did = 1
and measurement IN ("Current")
group by timekey
UNION
SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey
from events
WHERE lid = 1
and did = 1
and measurement IN ("Kilowatts")
group by timekey
This is the table that I am trying to look up to.
=============================================================
id | lid | did | measurement | date
=============================================================
1 | 1 | 1 | Kilowatts | 2020-04-27 00:00:00
=============================================================
2 | 1 | 1 | Current | 2020-04-27 00:00:00
=============================================================
3 | 1 | 1 | Voltage | 2020-04-27 00:00:00
=============================================================
4 | 1 | 1 | Kilowatts | 2020-04-27 00:00:01
=============================================================
5 | 1 | 1 | Current | 2020-04-27 00:00:01
=============================================================
6 | 1 | 1 | Voltage | 2020-04-27 00:00:01
=============================================================
7 | 1 | 1 | Kilowatts | 2020-04-27 00:00:02
=============================================================
8 | 1 | 1 | Current | 2020-04-27 00:00:02
=============================================================
9 | 1 | 1 | Voltage | 2020-04-27 00:00:02
The expected result is retrieve all data that have the date equal to 2020-04-27 00:00:00 and 2020-04-27 00:00:02. The query provided above work as expected. But I am using UNION for look up different measurements on the table, I believe it might not be the optimal way to do it.
Can any SQL expert help me to tone the query that I have to increase the performance?
You have one record every second for each and every measurement, and you want to select one record every two seconds.
You could try:
select *
from events
where
lid = 1
and did = 1
and measurement IN ('Voltage', 'Current')
and extract(second from date) % 2 = 0
This would select records that have an even second part.
Alternatively, if you always have one record every second, another option is row_number() (this requires MySQL 8.0):
select *
from (
select
e.*,
row_number() over(partition by measurement order by date) rn
from events
where
lid = 1
and did = 1
and measurement IN ('Voltage', 'Current')
) t
where rn % 2 = 1
This is a bit less accurate than the previous query though.
Your query is actually three queries combined into one. Luckily they all select rows of data based on similar columns. If you want to make this query run fast you can add the following index:
create index ix1 on events (lid, did, measurement);
In addition to above suggestions, changing the PRIMARY KEY will give you a little more performance:
PRIMARY KEY(lid, did, date, measurement)
and toss id.
Caveat, there could be hiccups if two readings come in at exactly the same "second". This could easily happen if one reading comes in just after the clock ticks, and the next comes in just before the next tick.
Related
I need help to optimize my 3 queries into one.
I have 2 tables, the first has a list of image processing servers I use, so different servers can handle different simultaneous job loads at a time, so I have a field called quota as seen below.
First table name, "img_processing_servers"
| id | server_url | server_key | server_quota |
| 1 | examp.uu.co | X0X1X2XX3X | 5 |
| 2 | examp2.uu.co| X0X1X2YX3X | 3 |
The second table registers if there is a job being performed at this moment on the server
Second table, "img_servers_lock"
| id | lock_server | timestamp |
| 1 | 1 | 2020-04-30 12:08:09 |
| 2 | 1 | 2020-04-30 12:08:09 |
| 3 | 1 | 2020-04-30 12:08:09 |
| 4 | 2 | 2020-04-30 12:08:09 |
| 5 | 2 | 2020-04-30 12:08:09 |
| 6 | 2 | 2020-04-30 12:08:09 |
Basically what I want to achieve is that my image servers don't go past the max quota and crash, so the 3 queries I would like to combine are:
Select at least one server available that hasn't reached it's quota and then insert a lock record for it.
SELECT * FROM `img_processing_servers` WHERE
SELECT COUNT(timestamp) FROM `img_servers_lock` WHERE `lock_server` = id
! if the count is < than quota, go ahead and register use
INSERT INTO `img_servers_lock`(`lock_server`, `timestamp`) VALUES (id_of_available_server, now())
How would I go about creating this single query?
My goal is to keep my image servers safe from overload.
Join the two tables and put that into an INSERT query.
INSERT INTO img_servers_lock(lock_server, timestamp)
SELECT s.id, NOW()
FROM img_processing_servers s
LEFT JOIN img_servers_lock l ON l.lock_server = s.id
GROUP BY s.id
HAVING IFNULL(COUNT(l.id), 0) < s.server_quota
ORDER BY s.server_quota - IFNULL(COUNT(l.id), 0) DESC
LIMIT 1
The ORDER BY clause makes it select the server with the most available quota.
OK, so I encountered just a small addition that was giving me a bug and it was that the s.server_quota had to be added to GROUP BY for it to work in the HAVING
INSERT INTO img_servers_lock(lock_server, timestamp)
SELECT s.id, NOW()
FROM alpr_servers s
LEFT JOIN img_servers_lock l ON l.lock_server = s.id
GROUP BY s.id, s.server_quota
HAVING IFNULL(COUNT(l.id), 0) < s.server_quota
ORDER BY s.server_quota - IFNULL(COUNT(l.id), 0) DESC
LIMIT 1
Thanks again Barmar!
I have a table called updates which has the distance of a vehicle at the captured_at date. Using MySQL, How can I get the SUM of differences between the first captured update and the latest captured update per vehicle.
updates table:
id | vehicle_id | distance | captured_at
1 | 1 | 100 | 2018-02-10
2 | 1 | 50 | 2018-02-05
3 | 1 | 75 | 2018-02-07
4 | 2 | 200 | 2018-02-07
5 | 2 | 300 | 2018-02-09
The result I'm expecting is:
(100-50) + (300-200) = 150
One thing to keep in mind is that a bigger ID does not necessarily mean that it's the latest update as you can see in the example above.
(Comment: naming your tables with reserved words is a bad idea)
Getting the smallest and largest values is trivial:
SELECT vehicle_id, MAX(distance) - MIN(distance)
FROM `updates`
GROUP BY vehicle_id;
Adding these values is trivial when you know that a SELECT query can be used n place of a table - but you also need to create aliases for the aggregated attributes:
SELECT SUM(diff)
FROM (
SELECT vehicle_id, MAX(distance) - MIN(distance) AS diff
FROM `updates`
GROUP BY vehicle_id
) AS src
I need to calculate the average time of all the operations stored in the database. The table I store operations in looks as follows:
creation time | operation_type | operation_id
2017-01-03 11:14:25 | START | 1
2017-01-03 11:14:26 | START | 2
2017-01-03 11:14:28 | END | 2
2017-01-03 11:14:30 | END | 1
In this case operation 1 took 5 seconds and operation 2 took 2 seconds to finish.
How can I calculate the average of these operations in MySQL?
EDIT:
It seems that operation_id doesn't need to be unique - given operation may be executed several times, so the table might look as follows:
creation time | operation_type | operation_id
2017-01-03 11:14:25 | START | 1
2017-01-03 11:14:26 | START | 2
2017-01-03 11:14:28 | END | 2
2017-01-03 11:14:30 | END | 1
2017-01-03 11:15:00 | START | 1
2017-01-03 11:15:10 | END | 1
What should I add in the query to properly calculate the average time of all these operations?
I'm not sure that a subquery is necessary...
SELECT AVG(TIME_TO_SEC(y.creation_time)-TIME_TO_SEC(x.creation_time)) avg_diff
FROM my_table x
JOIN my_table y
ON y.operation_id = x.operation_id
AND y.operation_type = 'end'
WHERE x.operation_type = 'start';
Since the END of an operation is always after the START you can use MIN and MAX
select avg(diff)
from
(
select operation_id,
TIME_TO_SEC(TIMEDIFF(max(creation_time), min(creation_time))) as diff
from your_table
group by operation_id
) tmp
select avg(diff)
from
(
select a1.operation_id, timediff(a2.operation_time, a1.operation_time) as diff
from oper a1 -- No table name provided, went with 'oper' because it made sense in my head
inner join oper a2
on a1.operation_id = a2.operation_id
where a1.operation_type = 'START'
and a2.operation_type = 'END'
)
Using MySQL, I have a table that keep track of user visit:
USER_ID | TIMESTAMP
--------+----------------------
1 | 2014-08-11 14:37:36
2 | 2014-08-11 12:37:36
3 | 2014-08-07 16:37:36
1 | 2014-07-14 15:34:36
1 | 2014-07-09 14:37:36
2 | 2014-07-03 14:37:36
3 | 2014-05-23 15:37:36
3 | 2014-05-13 12:37:36
Time is not important, more concern about answer to "how many days between entries"
How do I go about figuring how the average number of days between entries through SQL queries?
For example, the output should look like something like:
(output is just a sample, not reflection of the data table above)
USER_ID | AVG TIME (days)
--------+----------------------
1 | 2
2 | 3
3 | 1
MySQL has no direct "get something from a previous row" capabilities. Easiest workaround is to use a variable to store that "previous" value:
SET last = null;
SELECT user_id, AVG(diff)
FROM (
SELECT user_id, IF(last IS NULL, 0, timestamp - last) AS diff, #last := timestamp
FROM yourtable
ORDER BY user_id, timestamp ASC
) AS foo
GROUP BY user_id
The inner query does your "difference from previous row" calculations, and the outer query does the averaging.
Is there a way to multiply a column with a predefined number based on another column? There are multiple predefined numbers that are used depending on the value in the column.
Example:
Table
Columns: persons_id,activity,scale
Values
1,swimming,4
1,baseball,2
1,basketball,3
2,swimming,6
2,basketball,3
If my predefined numbers are: 6 (swimming), 8 (baseball), 5 (basketball)
The output would look like this
1,swimming,4,24
1,baseball,2,16
1,basketball,2,10
2,swimming,6,36
2,basketball,3,15
Edit: Thank you everyone for contributing. I ended up using the solution from sgeddes.
Sure, you can use CASE:
SELECT Persons_Id, Activity, Scale,
Scale *
CASE
WHEN Activity = 'swimming' THEN 6
WHEN Activity = 'baseball' THEN 8
WHEN Activity = 'basketball' THEN 5
ELSE 1
END Total
FROM YourTable
Good luck.
Have another column called WEIGHT that multiples the SCALE value. Perhaps you can calculate the product using a trigger to populate the column. Otherwise, a simple SELECT will do fine.
you can use this query:
select persons_id, activity, scale,
scale * case when activity = 'swimming' then 6
when activity = 'baseball' then 8
when activity = 'basketball' then 5 end as result
from Table1
but a better solution will be defining a new table Coefficients(activity, coefficient)
so that you can insert rows:
'swimming', 6
'baseball', 8
'basketball', 5
then use something like this:
select persons_id, activity, scale, scale * coefficient as result
from Table1 inner join Coefficients on Table1.activity = Coefficients.activity
You can also use a table that stores the value or create a subquery that will return the multipliers:
select persons_id,
t.activity,
scale,
scale * s.val as result
from yourtable t
inner join
(
select 'swimming' activity, 6 val
union all
select 'baseball' activity, 8 val
union all
select 'basketball' activity, 5 val
) s
on t.activity = s.activity
See SQL Fiddle with Demo
The result is:
| PERSONS_ID | ACTIVITY | SCALE | RESULT |
--------------------------------------------
| 1 | swimming | 4 | 24 |
| 1 | baseball | 2 | 16 |
| 1 | basketball | 3 | 15 |
| 2 | swimming | 6 | 36 |
| 2 | basketball | 3 | 15 |