Fast query slows down when within a subquery - mysql

I have this query:
SELECT timestmp
FROM devicelog
WHERE device_id = 5
ORDER BY id DESC LIMIT 1
which takes less than 0.001 seconds to fetch, but once I put it in a subquery, it slows down to about 3.05 seconds. Any reason to why it does this, or how I can remedy it?
Here is the second query (which is the one I want to optimize):
SELECT device.id,
(SELECT timestmp
FROM devicelog
WHERE device_id = device.id
ORDER BY id DESC LIMIT 1) as timestmp
FROM device
Table "device" only has like 10-15 records in it (devicelog has several million), so I would assume it goes 1 by 1 through each record and then executes the subquery, but obviously it's doing something else. The PK of devicelog is the id, and the PK in device is its id as well. There is an index on devicelog for timestmp (which is a datetime) and device_id which is also a FK back to devicelog. There are other indices as well, but they are irrelevant (things like names, descriptions, etc).
I just need it to loop through devices, then display the last timestamp record.
If I list each device in PHP, then perform the first query separately, it will be perform extraordinary well, but I want to do this in one entire query. Like, I could do something like (pseudocode):
foreach($row in <devicelog>)
query('<first query> where id = $row[id]')
Doing an entire join would be too expensive on devicelog just because the high count.

Your question and query do not match what you are looking for per the comment to the first answer offered.
What you can do is a pre-aggregate on a per-device basis to get the max ID log, then join that to your master list of devices...
SELECT
d.name,
d.id,
DeviceMax.lastTime
from
device d
LEFT JOIN ( select dl.device_id,
max( dl.timestamp ) lastTime
from
devicelog dl
group by
dl.device_id ) as DeviceMax
ON d.id = DeviceMax.device_id
Now, if you needed other stuff from the device log for that specific entry, we could just add on to that...
LEFT JOIN devicelog dl2
on DeviceMax.Device_id = dl2.Device_id
AND DeviceMax.lastTime = dl2.timestamp
then you can get any other columns from the "dl2" alias added to your query.
Also, for your device log table, I would have a covering index on (device_id, timestamp)
COMMENT FROM FEEDBACK
Then I would offer this as a suggestion for you, which is something also common in the world of web development when someone needs "highest count", or "most" of something, or "most recent" etc.
Denormalize your Device table with only one respect... add a column for the LastDeviceLogID. Then, whenever your DeviceLog has an entry added to it, you just use an after insert trigger that does...
update Device
set LastDeviceLogID = newRecord.DeviceLogID
where ID = newRecord.Device_ID
Columns may not be exact, but the principle is there. This way, you never need to do a LIMIT 1, MAX(), etc and go through your millions of records, you can get as simply as doing
SELECT
d.name,
d.id,
dl.timestamp,
dl.othercolumns
from
device d
LEFT JOIN devicelog dl
on d.LastDeviceLogID = dl.DeviceLogID

try this with inner join
SELECT d.id, dl.timestamp
FROM device d
INNER JOIN devicelog dl
ON dl.device_id = d.id
ORDER BY d.id DESC LIMIT 1
to list all devices you may consider the GROUP BY
SELECT d.id, dl.timestamp
FROM device d
INNER JOIN devicelog dl
ON dl.device_id = d.id
GROUP BY d.id
ORDER BY d.id DESC
Edit:
if you have many indices , thn better to index your column id
try this
ALTER TABLE `device` ADD INDEX `id` (`id`)

Related

MySQL Spring complicated query - ways to order and query efficiency

I run this complicated query on Spring JPA Repository.
My goal is to get all info from the site table, ordering it by events severity on each site.
This is my query:
SELECT alls.* FROM sites AS alls JOIN
(
SELECT distinct ets.id FROM
(
SELECT s.id, et.`type`, et.severity_level, COUNT(et.`type`) FROM sites AS s
JOIN users_sites AS us ON (s.id=us.site_id)
JOIN users AS u ON (us.user_id=u.user_id)
JOIN areas AS a ON (s.id=a.site_id)
JOIN panels AS p ON (a.id=p.area_id)
JOIN events AS e ON (p.id=e.panel_id)
JOIN event_types AS et ON (e.event_type_id=et.id)
WHERE u.user_id="98765432-123a-1a23-123b-11a1111b2cd3"
GROUP BY s.id , et.`type`, et.severity_level
ORDER BY et.severity_level, COUNT(et.`type`) DESC
) AS ets
) as etsd ON alls.id = etsd.id
The second select (the one with "distinct") returns site_ids ordered correctly by severity.
Note that there are different event_types + severity in each site, and I use pagination on the answer, so I need the distinct.
The problem is - the main select doesn't keep this order.
Is there any way to keep the order in one complicated query?
Another related question - one of my ideas was making two queries:
The "select distinct" query that will return me the order --> saved in a list "order list"
The main "sites" query (that becomes very simple) with "where id in {"order list"}
Order the second query in code by "order list".
I use the query every 10 seconds, so it is very sensitive on performance.
What seems to be faster in this case - original complicated query or those 2?
Any insight will be appreciated.
Tnx a lot.
A quirk of SQL's declarative set-oriented syntax for us procedural programmers: ORDER by clauses in subqueries are not carried through to the outer query, except sometimes by accident. If you want ordering at any query level, you must specify it at that level or you will get unpredictable results. The query optimizers are usually smart enough to avoid wasting sort operations.
Your requirement: give at most one sites row for each sites.id value, ordered by the worst event. Worst: lowest event severity, and if there are more than one event with lowest severity, the largest count.
Use this sort of thing to get the "worst" for each id, in place of DISTINCT.
SELECT id, MIN(severity_level) severity_level, MAX(num) num
FROM (
/* your inner query */
) ets
GROUP BY id
This gives at most one row per sites.id value. Then your outer query is
SELECT alls.*
FROM sites alls
JOIN (
SELECT id, MIN(severity_level) severity_level, MAX(num) num
FROM (
/* your inner query */
) ets
GROUP BY id
) worstevents ON alls.id = worstevents.id
ORDER BY worstevents.severity_level, worstevents.num DESC, alls.id
Putting it all together:
SELECT alls.*
FROM sites alls
JOIN (
SELECT id, MIN(severity_level) severity_level, MAX(num) num
FROM (
SELECT s.id, et.severity_level, COUNT(et.`type`) num
FROM sites AS s
JOIN users_sites AS us ON (s.id=us.site_id)
JOIN users AS u ON (us.user_id=u.user_id)
JOIN areas AS a ON (s.id=a.site_id)
JOIN panels AS p ON (a.id=p.area_id)
JOIN events AS e ON (p.id=e.panel_id)
JOIN event_types AS et ON (e.event_type_id=et.id)
WHERE u.user_id="98765432-123a-1a23-123b-11a1111b2cd3"
GROUP BY s.id , et.`type`, et.severity_level
) ets
GROUP BY id
) worstevents ON alls.id = worstevents.id
ORDER BY worstevents.severity_level, worstevents.num DESC, alls.id
An index on users.user_id will help performance for these single-user queries.
If you still have performance trouble, please read this and ask another question.

How can I speed up a multiple inner join query?

I have two tables. The first table (users) is a simple "id, username" with 100,00 rows and the second (stats) is "id, date, stat" with 20M rows.
I'm trying to figure out which username went up by the most in stat and here's the query I have. On a powerful machine, this query takes minutes to complete. Is there a better way to write it to speed it up?
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN stats AS b ON (b.id=a.id)
INNER JOIN stats AS c ON (c.id=a.id)
WHERE b.date = '2016-01-10'
AND c.date = '2016-01-13'
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
the other way i tried but it doesn't seem optimal is
SELECT a.id, a.username,
(SELECT b.stat FROM stats AS b ON (b.id=a.id) AND b.date = '2016-01-10') AS start,
(SELECT c.stat FROM stats AS c ON (c.id=a.id) AND c.date = '2016-01-14') AS end,
((SELECT b.stat FROM stats AS b ON (b.id=a.id) AND b.date = '2016-01-10') -
(SELECT c.stat FROM stats AS c ON (c.id=a.id) AND c.date = '2016-01-14')) AS stat_diff
FROM users AS a
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
Introduction
Let's suppose we rewrite sentence like this:
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN stats AS b ON
b.date = STR_TO_DATE('2016-01-10', '%Y-%m-%d' ) and b.id=a.id
INNER JOIN stats AS c ON
c.date = STR_TO_DATE('2016-01-13', '%Y-%m-%d' ) and c.id=a.id
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
And we ensure than:
users table has index on field id:
stats has index on composite field date, id: create index stats_idx_d_i on stats ( date, id );
Then
Database optimizer may use indexes to selected a Restricted Set of Date ('RSD'), that means, rows that match filtered dates. This is fast.
But
You are sorting by a calculated field:
(b.stat - c.stat) AS stat_diff #<-- calculated
ORDER BY stat_diff DESC #<-- this forces to calculate it
They are no possible optimization on this sort because you should to calculate one by one all results on your 'RSD' (restricted set of data).
Conclusion
The question is, how may rows they are on your 'RSD'? If only they are few hundreds rows you query may run fast, else, your query will be slow.
Any case, you should to be sure the first step of query ( without sorting ) is made by index and no fullscanning. Use Explain command to be sure.
All you need to do is to help optimizer.At a bare minimum.have a check list which looks like below
1.Are my join columns indexed ?
2.Are the where clauses Sargable
3.are there any implicit,explicit conversions
4.Am i seeing any statistics issues
one more interesting aspect to look at is how is your data distributed,once you understand the data,you will be able to intrepret the execution plan and alter it as per your need
EX:
Think like i have any customers table with 100rows,Each one has a minimum of 10 orders(total upto 10000 orders).Now if you need to find out only top 3 orders by date,you dont want a scan happening of orders table
Now in your case ,i may not go with second option,even though the optimizer may choose a good plan for this one as well,I will go first approach and try to see if the execution time is acceptable.if not then i will go through my check list and try to tune it further
The Query Seems OK, Verify your Indexes ..
Or
Try this Query
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN (select id,stat from stats where date = '2016-01-10') AS b ON (b.id=a.id)
INNER JOIN (select id,stat from stats where date = '2016-01-13') AS c ON (c.id=a.id)
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100

mysql query is taking too much time to execute

Hello everyone I am working on phpmyadmin database. Whenever I try to execute query it takes too much time more than 10 mins to show results. Is there any way to speed it up. please response.
The query is
SELECT ib.*, b.brand_name, m.model_name,
s.id as sale_id, br.branch_code,br.branch_name,r.rentry_date,r.id as rid
from in_book ib
left join brand b on ib.brand_id=b.id
left join model m on ib.vehicle_id=m.id
left join re_entry r on r.in_book_id=ib.id
left join sale s on ib.id=s.in_book_id
left join branch br on ib.branch_id=br.id
where ib.id !=''
and ib.branch_id='65'
group by ib.id
order by r.id ASC,
count(r.in_book_id) DESC ,
ib.purchaes_date ASC,
ib.id ASC
there are almost 7 tables
make sure you got an index on every key you use to join the tables.
from http://dev.mysql.com/doc/refman/5.5/en/optimization-indexes.html:
The best way to improve the performance of SELECT operations is to create indexes on one or more of the columns that are tested in the query. The index entries act like pointers to the table rows, allowing the query to quickly determine which rows match a condition in the WHERE clause, and retrieve the other column values for those rows. All MySQL data types can be indexed.
.. this of course also applies to the JOIN conditions.
You don't list any such indexes, however, I would start with the following suggested indexes
table index
in_book ( branch_id, id, brand_id, vehicle_id )
brand ( id, brand_name )
model ( id, model_name )
re_entry ( in_book_id, id, reentry_date )
sale ( in_book_id, id )
branch ( id )
Also, with MySQL, you can use a special keyword "STRAIGHT_JOIN" which tells the engine to query in the order you have selected the tables... Although you are doing LEFT JOINs, I don't think it will matter as it appears the secondary tables are all lookup type of tables and in_book is your primary. But as just a try it would be..
SELECT STRAIGHT_JOIN (...rest of query...)

Making query efficient

I have come across a query which, while working, is hard to understand(make changes) and in my opinioin is un-optimized.
SELECT cp.`order` AS `order`, cp.parent_id, cp.id AS category_id, cp.stub, cp.name as category_name, dc.deals_in_cat, d.*
FROM category_parent cp,
(
SELECT id, title, subtitle, image, image_m, discount, itemid, price, new_price, catalog_id, property_id, seller_id, category FROM deals
WHERE deals.category = 1
AND itemid NOT IN (156785431)
ORDER BY e_order LIMIT 8
) d,
(
SELECT a.`id` AS parent_id, COUNT( DISTINCT c.`itemid` ) AS deals_in_cat
FROM `category_parent` AS a
LEFT JOIN `navigation_filters_weightage` AS d ON a.`id` = d.`cat_id`,
`deals_parent_cat` AS b,
`deals` AS c
WHERE a.`parent_id` = b.`id`
AND c.`category` = a.`id` GROUP BY a.id ORDER BY b.`order` ASC , a.`order` ASC
) AS dc
WHERE cp.id = d.category
AND cp.active = '1'
AND dc.parent_id = cp.id;
Can you please suggest ways on making it more simpler.
Thanks
As noted in comments, indexes are probably a big factor for your query.
I would start by confirming you have at least the following indexes available
table index
deals (category, e_order, itemid )
category_parent (active, id )
Typically, I would have the itemID before the order by since it is part of the WHERE clause, but since you are getting all EXCEPT one ID, I think the order-by clause column would help out more.
One additional note... Your "dc" query for getting counts is doing the counts for ALL entries, but your outer query is only considering "active=1". I would add this qualifier in your "dc" query as well via
WHERE a.Active='1' AND -- rest of your criteria
Finally, being a website, doing counts on the fly repeatedly is always going to be a big performance hit. As suggested in other posts and again here, you may be better off by adding a column to your category_parents table for "Deals_In_Cat" and have it updated via a trigger whenever any deals are added or removed. This way, you get the count done ONCE when added/deleted, but all future references no longer requires the count being applied. This will probably be the best thing you can apply for performance.

Nested Query Optimization

SELECT id
FROM `events`
WHERE heading LIKE '%somestr%'
AND guests_since_start > 10
AND (SELECT COUNT(*)
FROM `submissions`
WHERE `submissions`.event_id = `events`.id)!=0;
Currently this query crashes my server. The idea is to get all id's from the events table that have corresponding rows in the submissions table (the events and submissions are linked by an event_it. As long as the number of rows is not 0, then I would like those records to be displayed.
The problem is the events table has approximately 3,000 records, while the submissions table has approximately 290,000.
Any help on the matter is appreciated.
A direct join will filter it for you and be more efficient:
SELECT DISTINCT id
FROM `events`
INNER JOIN `submissions`
ON `submissions`.event_id = `events`.id
WHERE heading LIKE '%somestr%'
AND guests_since_start > 10
DISTINCT is required to filter out duplicates. This reduces the efficiency a bit (since MySQL now now has to remove dups), but overall it'll be much faster than any subquery. Indexes on submissions.event_id and events.id will help further.
First I'd rewrite the query as:
SELECT e.id
FROM events e
WHERE e.heading LIKE '%somestr%'
AND e.guests_since_start > 10
AND EXISTS (SELECT 1 FROM submissions s WHERE s.event_id = e.id);
OR as Matt S pointed out (I believe his may be slightly faster, but I prefer the reduced syntax):
SELECT DISTINCT e.id
FROM events e
JOIN submissions s
ON s.event_id = e.id
WHERE e.heading LIKE '%somestr%'
AND e.guests_since_start > 10
You were doing an unnecessary count on submissions.
I'd also make sure I had indexes covering (e.id, e.guests_since_start) and (s.event_id).