I have a mysql (8.0.16) table with sensor devices (~1k+ rows) and a table with lots of data (~25M+ rows) from the sensors. There are single indexes on the timestamp and the m2m_device_id columns and a combined one on both columns.
I want to get the last data per device and currently do
SELECT * FROM `m2m_datas` WHERE (m2m_device_id = 980) ORDER BY timestamp DESC LIMIT 1;
SELECT * FROM `m2m_datas` WHERE (m2m_device_id = 981) ORDER BY timestamp DESC LIMIT 1;
SELECT * FROM `m2m_datas` WHERE (m2m_device_id = 982) ORDER BY timestamp DESC LIMIT 1;
and so on ...
This takes between 500ms up to 4s depending on the state of the db.
I though i could improve it by using subqueries or joins to be faster and reduce query count. So i first came up with something like this:
SELECT device.name, (
SELECT timestamp
FROM m2m_datas
WHERE m2m_device_id = device.id
ORDER BY timestamp DESC limit 1 ) d
FROM m2m_devices device;
For one this takes even more time (about 10s to 15s) and i dont get all columns out of my data.
After some research i tried the following
SELECT device.name, datapoint.timestamp, datapoint.user_data
FROM m2m_devices device
INNER JOIN m2m_datas datapoint ON datapoint.id = (
SELECT d.id FROM m2m_datas AS d WHERE d.m2m_device_id = device.id ORDER BY timestamp DESC LIMIT 1
)
At least i can get all of my data here if i want but this even took more time (25s to 40s).
When trying around i came up with a slight variant to the above with which i also was able to increase my LIMIT clause if needed
SELECT device.name, datapoint.timestamp, datapoint.user_data
FROM m2m_devices device
INNER JOIN m2m_datas datapoint ON datapoint.id IN (
SELECT * FROM (
SELECT d.id FROM m2m_datas AS d WHERE d.m2m_device_id = device.id ORDER BY timestamp DESC LIMIT 1
) as t
)
Interestingly this took less time (10s to 17s).
So im kind of out of ideas of what i could do to increase the performance of the queries. It seems doing single queries for all devices individually is the best option.
Am i missing something here. Are there some better queries which can achieve the same result in at least the same time?
One common approach is as follows. Appropriately indexed, this should be fast enough for most cases...
SELECT x.*
FROM m2m_datas x
JOIN
( SELECT m2m_device_id
, MAX(timestamp) timestamp
FROM m2m_datas
GROUP
BY m2m_device_id
) y
ON y.m2m_device_id = x.m2m_device_id
AND y.timestamp = x.timestamp;
If you are using MySQL 8+, then ROW_NUMBER might be helpful here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY m2m_device_id ORDER BY timestamp DESC) rn
FROM m2m_datas
WHERE m2m_device_id IN (980, 981, 982)
)
SELECT *
FROM cte
WHERE rn = 1;
To further speed up this approach, an index on (m2m_device_id, timestamp) might help.
I have the following table
As you can see It has 1868155 rows. I am attempting to make a realtime graph, but It is impossible since almost any query lasts 1 or 2 seconds.
For example, this query
SELECT sensor.nombre, temperatura.temperatura
FROM sensor, temperatura
WHERE sensor.id = temperatura.idsensor
ORDER BY temperatura.fecha DESC, idsensor ASC
LIMIT 4
Is supposed to show this
Ive tried everything, using indexes(perhaps not correctly), using only the fields i need instead of *, etc. but the results are the same!
These are the indexes of the table.
Explain of the query
EDITED
This is the explain of the query after implementing
ALTER TABLE temperatura
ADD INDEX `sensor_temp` (`idsensor`,`fecha`,`temperatura`)
And using inner join syntax for the query
SELECT s.nombre, t.temperatura
FROM sensor s
INNER JOIN temperatura t
ON s.id = t.idsensor
ORDER BY t.fecha DESC, t.idsensor ASC
LIMIT 4
This is my whole sensor table
Try the following:
ALTER TABLE temperatura
ADD INDEX `sensor_temp` (`idsensor`,`fecha`,`temperatura`)
I also recommend using modern join syntax:
SELECT s.nombre, t.temperatura
FROM sensor s
INNER JOIN temperatura t
ON s.id = t.idsensor
ORDER BY t.fecha DESC, t.idsensor ASC
LIMIT 4
Report the EXPLAIN again after making the above changes, if performance is still not good enough.
Attempt #2
After looking closely at what it appears you are trying to do, I believe this next query may be more effective:
SELECT
s.nombre, t.temperatura
FROM temperatura t
LEFT OUTER JOIN temperatura later_t
ON later_t.idsensor = t.idsensor
AND later_t.fecha > t.fecha
INNER JOIN sensor s
ON s.id = t.idsensor
WHERE later_t.idsensor IS NULL
ORDER BY t.idsensor ASC
You can also try:
SELECT
s.nombre, t.temperatura
FROM temperatura t
INNER JOIN (
SELECT
t.idsensor,
MAX(t.fecha) AS fecha
FROM temperatura t
GROUP BY t.idsensor
) max_fecha
ON max_fecha.idsensor = t.idsensor
AND max_fecha.fecha > t.fecha
INNER JOIN sensor s
ON s.id = t.idsensor
ORDER BY t.idsensor ASC
In my experience, if you are trying to find the most recent record, one of the two queries above will work. Which works best depends on various factors, so try them both.
Let me know how those perform, and if they still get you the data you want. Also, any query you run, run at least 3 times, and report all 3 times. That will help get an accurate measure of how fast a given query is, since various external factors can affect the speed of a query.
It is not possible to optimize a mixture of ASC and DESC, as in
ORDER BY t.fecha DESC, t.idsensor ASC
You tried a covering index:
INDEX `sensor_temp` (`idsensor`,`fecha`,`temperatura`)
However, this covering index may be better:
INDEX `sensor_temp` (`fecha`,`idsensor`,`temperatura`)
Then, if you are willing to get the sensors in a different order, use
ORDER BY t.fecha DESC, t.idsensor DESC
This will give you up to 4 sensors for the last fecha:
sensor: PRIMARY KEY(id)
tempuratura: INDEX(fecha, idsensor, tempuratura)
SELECT
( SELECT nombre FROM sensor WHERE id = t.idsensor ) AS nombre,
t.temperatura
FROM
( SELECT MAX(fecha) AS max_fecha FROM tempuratura ) AS z
JOIN temperatura AS t ON t.fecha = z.max_fecha
ORDER BY t.idsensor ASC
LIMIT 4;
Let's say we've got high scores table with columns app_id, best_score, best_time, most_drops, longest_something and couple more.
I'd like to collect top three results ON EACH CATEGORY grouped by app_id?
For now I'm using separate rank queries on each category in a loop:
SELECT app_id, best_something1,
FIND_IN_SET( best_something1,
(SELECT GROUP_CONCAT( best_something1
ORDER BY best_something1 DESC)
FROM highscores )) AS rank
FROM highscores
ORDER BY best_something1 DESC LIMIT 3;
Two things worth to add:
All columns for specific app are being updated at the same time (can consider creating a helper table).
the result of prospective "turbo query" might be requested quite often - as often as updating the values.
I'm quite basic with SQL and suspect that it has many more commands that combined together could do the magic?
What I'd expect from this post is that some wise owl would at least point the direction where to go or how to go.
The sample table:
http://sqlfiddle.com/#!2/eef053/1
Here is sample result too (already in json format, sry):
{"total_blocks":[["13","174","1"],["9","153","2"],["10","26","3"]],"total_games":[["13","15","1"],["9","12","2"],["10","2","3"]],"total_score":[["13","410","1"],["9","332","2"],["11","88","3"]],"aver_pps":[["11","4.34011","1"],["13","2.64521","2"],["12","2.60623","3"]],"aver_drop_per_game":[["11","20","1"],["10","13","2"],["9","12.75","3"]],"aver_drop_val":[["11","4.4","1"],["13","2.35632","2"],["9","2.16993","3"]],"aver_score":[["11","88","1"],["9","27.6667","2"],["13","27.3333","3"]],"best_pps":[["13","4.9527","1"],["11","4.34011","2"],["9","4.13076","3"]],"most_drops":[["11","20","1"],["9","16","2"],["13","16","2"]],"longest_drop":[["9","3","1"],["13","2","2"],["11","2","2"]],"best_drop":[["11","42","1"],["13","36","2"],["9","30","3"]],"best_score":[["11","88","1"],["13","78","2"],["9","58","3"]]}
When I encounter this scenario, I prefer to employ the UNION clause, and combine the queries tailored to each ORDERing and LIMIT.
http://dev.mysql.com/doc/refman/5.1/en/union.html
UNION combines the result rows vertically (top 3 rows for 5 sort categories yields 15 rows).
For your specific purpose, you might then pivot them as sub-SELECTs, rolling them up with GROUP_CONCAT GROUPed on user so that each has the delimited list.
I'd test something like this query, to see if the performance is any better or not. I think this comes pretty close to satisfying the specification:
( SELECT 99 AS seq_
, a.category
, CONVERT(a.val,DOUBLE) AS val
, FIND_IN_SET(a.val,r.highest_vals) AS rank
, a.user_id
FROM ( SELECT 'total_blocks' AS category
, b.`total_blocks` AS val
, b.user_id
FROM app b
ORDER BY b.`total_blocks` DESC
LIMIT 3
) a
CROSS
JOIN ( SELECT GROUP_CONCAT(s.val ORDER BY s.val DESC) AS highest_vals
FROM ( SELECT t.`total_blocks` AS val
FROM app t
ORDER BY t.`total_blocks` DESC
LIMIT 3
) s
) r
ORDER BY a.val DESC
)
UNION ALL
( SELECT 97 AS seq_
, a.category
, CONVERT(a.val,DOUBLE) AS val
, FIND_IN_SET(a.val,r.highest_vals) AS rank
, a.user_id
FROM ( SELECT 'XXX' AS category
, b.`XXX` AS val
, b.user_id
FROM app b
ORDER BY b.`XXX` DESC
LIMIT 3
) a
CROSS
JOIN ( SELECT GROUP_CONCAT(s.val ORDER BY s.val DESC) AS highest_vals
FROM ( SELECT t.`XXX` AS val
FROM app t
ORDER BY t.`XXX` DESC
LIMIT 3
) s
) r
ORDER BY a.val DESC
)
ORDER BY seq_ DESC, val DESC
To unpack this a little bit... this is essentially separate queries that are combined with UNION ALL set operator.
Each of the queries returns a literal value to allow for ordering. (In this case, I've given the column a rather anonymous name seq_ (sequence)... if the specific order isn't important, then this could be removed.
Each query is also returning a literal value that tells which "category" the row is for.
Because some of the values returned are INTEGER, and others are FLOAT, I'd cast all of those values to floating point, so the datatypes of each query line up.
For the FLOAT (floating point) type values, there can be a problem with comparison. So I'd go with casting those to decimal and stringing them together into a list using GROUP_CONCAT (as the original query does).
Since we are returning only three rows from each query, we only need to concatenate together the three largest values. (If there's a two way "tie" for first place, we'll return rank values of 1, 1, 3.)
Suitable indexes for each query will improve performance for large sets.
... ON app (total_blocks, user_id)
... ON app (best_pps,user_id)
... ON app (XXX,user_id)
I need to perform a sort before the GROUP BY clause, but MySQL does not want to cooperate.
SELECT `article`, `date`, `aip`
FROM `official_prices`
WHERE `article` = 2003
GROUP BY `article`
ORDER BY `date` ASC
The row that should be picked is the one with the earliest date (2013-07-15) but instead it picks the date that comes first in table order. Changing to DESC does no difference.
First image shows both rows, ungrouped. Second image is them being grouped.
This table is being joined to by a main query, so (I think) any solutions involving LIMIT 1 won't be useful to me.
Full query:
SELECT `articles`.*, `official_prices`.`aip`
FROM `articles`
LEFT JOIN `official_prices`
ON (`official_prices`.`article` = `articles`.`id`)
GROUP BY `articles`.`id`, `official_prices`.`article`
ORDER BY `official_prices`.`date` ASC, `articles`.`name`
You can't use group by and order like that. The order will only apply to the complete record set being returned and not in the group itself. This will work:
select o1.*
from official_prices o1
inner join
(
SELECT `article`, min(`date`) as mdate
from `official_prices`
WHERE `article` = 2003
GROUP BY `article`
) o2 on o1.article = o2.article and o1.date = o2.mdate
What you are trying to do is simply incorrect. The ordering before the group by does not have a (guaranteed) effect on the results.
My guess is that you want to get the most recent date and aip for that date. Here is a better approach:
SELECT `article`, max(`date`),
substring_index(group_concat(`aip` order by date desc), ',', 1) as lastAip
FROM `official_prices`
WHERE `article` = 2003
GROUP BY `article`;
The only downside is that the group_concat() will convert any value to a string. If it is some other type (and a string poses problems), then convert it back to the desired type.
Actually, an even better approach is to skip the group by entirely, because you are already filtering down to one article:
select article, `date`, aip
from official_prices
where article = 2003
order by `date` desc
limit 1;
The first approach works for multiple articles.
EDIT:
Your full query is:
SELECT `articles`.*, `official_prices`.`aip`
FROM `articles` LEFT JOIN
`official_prices`
ON `official_prices`.`article` = `articles`.`id`
GROUP BY `articles`.`id`, `official_prices`.`article`
ORDER BY `official_prices`.`date` ASC, `articles`.`name`;
You are looking for more than one article, so the second approach won't work. So, use the first:
SELECT `articles`.*,
substring_index(group_concat(`official_prices`.`aip` order by `official_prices`.`date` desc),
',', 1) as lastAIP
FROM `articles` LEFT JOIN
`official_prices`
ON `official_prices`.`article` = `articles`.`id`
GROUP BY `articles`.`id`, `official_prices`.`article`
ORDER BY `articles`.`name`;
I was thinking a way to using one query with a subquery instead of using two seperate queries.
But turns out using a subquery is causing multiple requests for each row in result set. Is there a way to limit that count subquery result only one with in a combined query ?
SELECT `ad_general`.`id`,
( SELECT count(`ad_general`.`id`) AS count
FROM (`ad_general`)
WHERE `city` = 708 ) AS count,
FROM (`ad_general`)
WHERE `ad_general`.`city` = '708'
ORDER BY `ad_general`.`id` DESC
LIMIT 15
May be using a join can solve the problem but dunno how ?
SELECT ad_general.id, stats.cnt
FROM ad_general
JOIN (
SELECT count(*) as cnt
FROM ad_general
WHERE city = 708
) AS stats
WHERE ad_general.city = 708
ORDER BY ad_general.id DESC
LIMIT 15;
The explicit table names aren't required, but are used both for clarity and maintainability (the explicit table names will prevent any imbiguities should the schema for ad_general or the generated table ever change).
You can self-join (join the table to itself table) and apply aggregate function to the second.
SELECT `adgen`.`id`, COUNT(`adgen_count`.`id`) AS `count`
FROM `ad_general` AS `adgen`
JOIN `ad_general` AS `adgen_count` ON `adgen_count`.city = 708
WHERE `adgen`.`city` = 708
GROUP BY `adgen`.`id`
ORDER BY `adgen`.`id` DESC
LIMIT 15
However, it's impossible to say what the appropriate grouping is without knowing the structure of the table.