SELECT id
FROM `events`
WHERE heading LIKE '%somestr%'
AND guests_since_start > 10
AND (SELECT COUNT(*)
FROM `submissions`
WHERE `submissions`.event_id = `events`.id)!=0;
Currently this query crashes my server. The idea is to get all id's from the events table that have corresponding rows in the submissions table (the events and submissions are linked by an event_it. As long as the number of rows is not 0, then I would like those records to be displayed.
The problem is the events table has approximately 3,000 records, while the submissions table has approximately 290,000.
Any help on the matter is appreciated.
A direct join will filter it for you and be more efficient:
SELECT DISTINCT id
FROM `events`
INNER JOIN `submissions`
ON `submissions`.event_id = `events`.id
WHERE heading LIKE '%somestr%'
AND guests_since_start > 10
DISTINCT is required to filter out duplicates. This reduces the efficiency a bit (since MySQL now now has to remove dups), but overall it'll be much faster than any subquery. Indexes on submissions.event_id and events.id will help further.
First I'd rewrite the query as:
SELECT e.id
FROM events e
WHERE e.heading LIKE '%somestr%'
AND e.guests_since_start > 10
AND EXISTS (SELECT 1 FROM submissions s WHERE s.event_id = e.id);
OR as Matt S pointed out (I believe his may be slightly faster, but I prefer the reduced syntax):
SELECT DISTINCT e.id
FROM events e
JOIN submissions s
ON s.event_id = e.id
WHERE e.heading LIKE '%somestr%'
AND e.guests_since_start > 10
You were doing an unnecessary count on submissions.
I'd also make sure I had indexes covering (e.id, e.guests_since_start) and (s.event_id).
Related
I run this complicated query on Spring JPA Repository.
My goal is to get all info from the site table, ordering it by events severity on each site.
This is my query:
SELECT alls.* FROM sites AS alls JOIN
(
SELECT distinct ets.id FROM
(
SELECT s.id, et.`type`, et.severity_level, COUNT(et.`type`) FROM sites AS s
JOIN users_sites AS us ON (s.id=us.site_id)
JOIN users AS u ON (us.user_id=u.user_id)
JOIN areas AS a ON (s.id=a.site_id)
JOIN panels AS p ON (a.id=p.area_id)
JOIN events AS e ON (p.id=e.panel_id)
JOIN event_types AS et ON (e.event_type_id=et.id)
WHERE u.user_id="98765432-123a-1a23-123b-11a1111b2cd3"
GROUP BY s.id , et.`type`, et.severity_level
ORDER BY et.severity_level, COUNT(et.`type`) DESC
) AS ets
) as etsd ON alls.id = etsd.id
The second select (the one with "distinct") returns site_ids ordered correctly by severity.
Note that there are different event_types + severity in each site, and I use pagination on the answer, so I need the distinct.
The problem is - the main select doesn't keep this order.
Is there any way to keep the order in one complicated query?
Another related question - one of my ideas was making two queries:
The "select distinct" query that will return me the order --> saved in a list "order list"
The main "sites" query (that becomes very simple) with "where id in {"order list"}
Order the second query in code by "order list".
I use the query every 10 seconds, so it is very sensitive on performance.
What seems to be faster in this case - original complicated query or those 2?
Any insight will be appreciated.
Tnx a lot.
A quirk of SQL's declarative set-oriented syntax for us procedural programmers: ORDER by clauses in subqueries are not carried through to the outer query, except sometimes by accident. If you want ordering at any query level, you must specify it at that level or you will get unpredictable results. The query optimizers are usually smart enough to avoid wasting sort operations.
Your requirement: give at most one sites row for each sites.id value, ordered by the worst event. Worst: lowest event severity, and if there are more than one event with lowest severity, the largest count.
Use this sort of thing to get the "worst" for each id, in place of DISTINCT.
SELECT id, MIN(severity_level) severity_level, MAX(num) num
FROM (
/* your inner query */
) ets
GROUP BY id
This gives at most one row per sites.id value. Then your outer query is
SELECT alls.*
FROM sites alls
JOIN (
SELECT id, MIN(severity_level) severity_level, MAX(num) num
FROM (
/* your inner query */
) ets
GROUP BY id
) worstevents ON alls.id = worstevents.id
ORDER BY worstevents.severity_level, worstevents.num DESC, alls.id
Putting it all together:
SELECT alls.*
FROM sites alls
JOIN (
SELECT id, MIN(severity_level) severity_level, MAX(num) num
FROM (
SELECT s.id, et.severity_level, COUNT(et.`type`) num
FROM sites AS s
JOIN users_sites AS us ON (s.id=us.site_id)
JOIN users AS u ON (us.user_id=u.user_id)
JOIN areas AS a ON (s.id=a.site_id)
JOIN panels AS p ON (a.id=p.area_id)
JOIN events AS e ON (p.id=e.panel_id)
JOIN event_types AS et ON (e.event_type_id=et.id)
WHERE u.user_id="98765432-123a-1a23-123b-11a1111b2cd3"
GROUP BY s.id , et.`type`, et.severity_level
) ets
GROUP BY id
) worstevents ON alls.id = worstevents.id
ORDER BY worstevents.severity_level, worstevents.num DESC, alls.id
An index on users.user_id will help performance for these single-user queries.
If you still have performance trouble, please read this and ask another question.
I'm running two queries.
The first one gets unique IDs. This executes in ~350ms.
select parent_id
from duns_match_sealed_air_072815
group by duns_number
Then I paste those IDs into this second query. With >10k ids pasted in, it also executes in about ~350ms.
select term, count(*) as count
from companies, business_types, business_types_to_companies
where
business_types.id = business_types_to_companies.term_id
and companies.id = business_types_to_companies.company_id
and raw_score > 25
and diversity = 1
and company_id in (paste,ten,thousand,ids,here)
group by term
order by count desc;
When I combine these queries into one it takes a long time to execute. I don't know how long because I stopped it after minutes.
select term, count(*) as count
from companies, business_types, business_types_to_companies
where
business_types.id = business_types_to_companies.term_id
and companies.id = business_types_to_companies.company_id
and raw_score > 25
and diversity = 1
and company_id in (
select parent_id
from duns_match_sealed_air_072815
group by duns_number
)
group by term
order by count desc;
What is going on?
It's down to the way it processes the query - I believe it has to run your embedded query once for each row, whereas using two queries allows you to store the result.
Hope this helps!
The query has been re-written using JOIN, but particularly I've used EXISTS instead of IN. This is a short in the dark. It is possible that there may be many values generated in the sub-query causing the outer query to struggle while it goes through matching each item returned from the sub-query.
select term, count(*) as count
from companies c
inner join business_types_to_companies bc on bc.company_id = c.id
inner join business_types b on b.id = bc.term_id
where
raw_score > 25
and diversity = 1
and exists (
select 1
from duns_match_sealed_air_072815
where parent_id = c.id
)
group by term
order by count desc;
First, with respect, your subquery doesn't use GROUP BY in a sensible way.
select parent_id /* wrong GROUP BY */
from duns_match_sealed_air_072815
group by duns_number
In fact, it misuses the pernicious MySQL extension to GROUP BY. Read this. http://dev.mysql.com/doc/refman/5.6/en/group-by-handling.html . I can't tell what your application logic intends from this query, but I can tell you that it actually returns an unpredictably selected parent_id value associated with each distinct duns_number value.
Do you want
select MIN(parent_id) parent_id
from duns_match_sealed_air_072815
group by duns_number
or something like that? That one selects the lowest parent ID associated with each given number.
Sometimes MySQL has a hard time optimizing the WHERE .... IN () query pattern. Try a join instead. Like this:
select term, count(*) as count
from companies
join (
select MIN(parent_id) parent_id
from duns_match_sealed_air_072815
group by duns_number
) idlist ON companies.id = idlist.parent_id
join business_types_to_companies ON companies.id = business_types_to_companies.company_id
join business_types ON business_types.id = business_types_to_companies.term_id
where raw_score > 25
and diversity = 1
group by term
order by count desc
To optimize this further we'll need to see the table definitions and the output from EXPLAIN.
I am doing a sub-query join to another table as I wanted to be able to sort the results I got back with it, I only need the first row but I need them ordered in a certain way so I would get the lowest id.
I tried adding LIMIT 1 to this but then the full query returned 0 results; so now it has no limit and in the EXPLAIN I have two rows showing they are using the full 10k+ rows of the auction_media table.
I wrote it this way to avoid having to query the auction_media table for each row separately, but now I'm thinking that this way isn't that great if it has to use the whole auction_media table?
Which way is better? The way I have it or querying the auction_media table separately? ...or is there a better way!?
Here is the code:
SELECT
a.auction_id,
a.name,
media.media_url
FROM
auctions AS a
LEFT JOIN users AS u ON u.user_id=a.owner_id
INNER JOIN ( SELECT media_id,media_url,auction_id
FROM auction_media
WHERE media_type=1
AND upload_in_progress=0
ORDER BY media_id ASC
) AS media
ON a.auction_id=media.auction_id
WHERE a.hpfeat=1
AND a.active=1
AND a.approved=1
AND a.closed=0
AND a.creation_in_progress=0
AND a.deleted=0
AND (a.list_in='auction' OR u.shop_active='1')
GROUP BY a.auction_id;
Edit: Through my testing, using the above query seems like it would be the much faster method overall; however I worry if that will still be the case when the auction_media table grows to like 1M rows or something.
edit: As stated in the comments - DISTINCT is not required because the auctions table can only be associated with (at most) one user table row and one row in the inner query.
You may want to try this. The outer query's GROUP BY is replaced with DISTINCT since you don't have any aggregate function. The inner query, was replaced by a query to find the smallest media_id per auction_id, then JOINed back to get the media_url. (Since I didn't know if the media_id and auction_id were a composite unique key, I used the same WHERE clause to help eliminate potential duplicates.)
SELECT
a.auction_id,
a.name,
auction_media.media_url
FROM auctions AS a
LEFT JOIN users AS u
ON u.user_id=a.owner_id
INNER JOIN (SELECT auction_id, MIN(media_id) AS media_id
FROM auction_media
WHERE media_type=1
AND upload_in_progress=0
GROUP BY auction_id) AS media
ON a.auction_id=media.auction_id
INNER JOIN auction_media
ON auction_media.media_id = media.media_id
AND auction_media.auction_id = media.auction_id
AND auction_media.media_type=1
AND auction_media.upload_in_progress=0
WHERE a.hpfeat=1
AND a.active=1
AND a.approved=1
AND a.closed=0
AND a.creation_in_progress=0
AND a.deleted=0
AND (a.list_in='auction' OR u.shop_active='1');
I have this query:
SELECT timestmp
FROM devicelog
WHERE device_id = 5
ORDER BY id DESC LIMIT 1
which takes less than 0.001 seconds to fetch, but once I put it in a subquery, it slows down to about 3.05 seconds. Any reason to why it does this, or how I can remedy it?
Here is the second query (which is the one I want to optimize):
SELECT device.id,
(SELECT timestmp
FROM devicelog
WHERE device_id = device.id
ORDER BY id DESC LIMIT 1) as timestmp
FROM device
Table "device" only has like 10-15 records in it (devicelog has several million), so I would assume it goes 1 by 1 through each record and then executes the subquery, but obviously it's doing something else. The PK of devicelog is the id, and the PK in device is its id as well. There is an index on devicelog for timestmp (which is a datetime) and device_id which is also a FK back to devicelog. There are other indices as well, but they are irrelevant (things like names, descriptions, etc).
I just need it to loop through devices, then display the last timestamp record.
If I list each device in PHP, then perform the first query separately, it will be perform extraordinary well, but I want to do this in one entire query. Like, I could do something like (pseudocode):
foreach($row in <devicelog>)
query('<first query> where id = $row[id]')
Doing an entire join would be too expensive on devicelog just because the high count.
Your question and query do not match what you are looking for per the comment to the first answer offered.
What you can do is a pre-aggregate on a per-device basis to get the max ID log, then join that to your master list of devices...
SELECT
d.name,
d.id,
DeviceMax.lastTime
from
device d
LEFT JOIN ( select dl.device_id,
max( dl.timestamp ) lastTime
from
devicelog dl
group by
dl.device_id ) as DeviceMax
ON d.id = DeviceMax.device_id
Now, if you needed other stuff from the device log for that specific entry, we could just add on to that...
LEFT JOIN devicelog dl2
on DeviceMax.Device_id = dl2.Device_id
AND DeviceMax.lastTime = dl2.timestamp
then you can get any other columns from the "dl2" alias added to your query.
Also, for your device log table, I would have a covering index on (device_id, timestamp)
COMMENT FROM FEEDBACK
Then I would offer this as a suggestion for you, which is something also common in the world of web development when someone needs "highest count", or "most" of something, or "most recent" etc.
Denormalize your Device table with only one respect... add a column for the LastDeviceLogID. Then, whenever your DeviceLog has an entry added to it, you just use an after insert trigger that does...
update Device
set LastDeviceLogID = newRecord.DeviceLogID
where ID = newRecord.Device_ID
Columns may not be exact, but the principle is there. This way, you never need to do a LIMIT 1, MAX(), etc and go through your millions of records, you can get as simply as doing
SELECT
d.name,
d.id,
dl.timestamp,
dl.othercolumns
from
device d
LEFT JOIN devicelog dl
on d.LastDeviceLogID = dl.DeviceLogID
try this with inner join
SELECT d.id, dl.timestamp
FROM device d
INNER JOIN devicelog dl
ON dl.device_id = d.id
ORDER BY d.id DESC LIMIT 1
to list all devices you may consider the GROUP BY
SELECT d.id, dl.timestamp
FROM device d
INNER JOIN devicelog dl
ON dl.device_id = d.id
GROUP BY d.id
ORDER BY d.id DESC
Edit:
if you have many indices , thn better to index your column id
try this
ALTER TABLE `device` ADD INDEX `id` (`id`)
I have got this query:
SELECT
t.type_id, t.product_id, u.account_id, t.name, u.username
FROM
types AS t
INNER JOIN
( SELECT user_id, username, account_id
FROM users WHERE account_id=$account_id ) AS u
ON
t.user_id = u.user_id
ORDER BY
t.type_id DESC
1st question:
It takes around 30seconds to do this at the moment with only 18k records in types table.
The only indexes at the moment are only a primary indexes with just id.
Would the long time be caused by a lack of more indexes? Or would it be more to do with the structure of this query?
2nd question:
How can I add the LIMIT so I only get 100 records with the highest type_id?
Without changing the results, I think it is a 100 times faster if you don't make a sub-select of your users table. It is not needed at all in this case.
You can just add LIMIT 100 to get only the first 100 results (or less if there aren't a 100).
SELECT SQL_CALC_FOUND_ROWS /* Calculate the total number of rows, without the LIMIT */
t.type_id, t.product_id, u.account_id, t.name, u.username
FROM
types t
INNER JOIN users u ON u.user_id = t.user_id
WHERE
u.account_id = $account_id
ORDER BY
t.type_id DESC
LIMIT 1
Then, execute a second query to get the total number of rows that is calculated.
SELECT FOUND_ROWS()
That sub select on MySQL is going to slow down your query. I'm assuming that this
SELECT user_id, username, account_id
FROM users WHERE account_id=$account_id
doesn't return many rows at all. If that's the case then the sub select alone won't explain the delay you're seeing.
Try throwing an index on user_id in your types table. Without it, you're doing a full table scan of 18k records for each record returned by that sub select.
Inner join the users table and add that index and I bet you see a huge increase in speed.