How can I optimize this SQL query with nested SELECT's? - mysql

I have the following query:
SELECT src_big, created, modified, owner, aid, caption
FROM photo
WHERE aid IN (SELECT aid, modified FROM album WHERE owner IN (SELECT uid2 FROM friend WHERE uid1=me() or uid2 = me())order by modified desc)
ORDER BY created DESC
LIMIT 30
This runs pretty slow, and I'm sure because of the nested SELECT's etc. How can I make this perform quicker? How should it be rewritten to be better optimized?

try to use joins instead of SubQuery it is faster:
SELECT photo.src_big, photo.created, photo.modified, photo.owner, photo.aid,
photo.caption FROM photo
inner join album on album.aid = photo.aid
inner join friend on album.owner = friend.uid2
WHERE uid1=me() or uid2 = me()
order by modified desc,created DESC LIMIT 30
Note: you need to put the table names at the end

We have several instances in our Facebook games where this sort of functionality is necessary and we had lots of problems with it. In mySQL it seems that each row of the outer query will rerun the nested query, which will rerun the one nested in it, making it very slow. We found that returning the results of the inner query to php, concatenating it and then running the next query with the compiled list was much fast. Not sure it there is a solution entirely inside of mySQL but this solution works quite well for us.
Another potential problem could be indexing, you will want to make sure that all of the columns you are searching by or ordering by are properly indexed. Using mySQLs Explain function on your query will also help to find problems.

you could make it a join
select (stuff) from photo join album
on photo.aid = album.aid join friend
on album.owner = friend.uid2
where friend.uid1 = me() or friend.uid2 = me()
order by created desc limit 30
BUT be aware: since you are using stored functions this will never get into the query cache.
You can see what's happening with your query by prefacing it with 'desc' - this will show you how the optimizer is dealing with it.

Related

Query takes too long to run

I am running the below query to retrive the unique latest result based on a date field within a same table. But this query takes too much time when the table is growing. Any suggestion to improve this is welcome.
select
t2.*
from
(
select
(
select
id
from
ctc_pre_assets ti
where
ti.ctcassettag = t1.ctcassettag
order by
ti.createddate desc limit 1
) lid
from
(
select
distinct ctcassettag
from
ctc_pre_assets
) t1
) ro,
ctc_pre_assets t2
where
t2.id = ro.lid
order by
id
Our able may contain same row multiple times, but each row with different time stamp. My object is based on a single column for example assettag I want to retrieve single row for each assettag with latest timestamp.
It's simpler, and probably faster, to find the newest date for each ctcassettag and then join back to find the whole row that matches.
This does assume that no ctcassettag has multiple rows with the same createddate, in which case you can get back more than one row per ctcassettag.
SELECT
ctc_pre_assets.*
FROM
ctc_pre_assets
INNER JOIN
(
SELECT
ctcassettag,
MAX(createddate) AS createddate
FROM
ctc_pre_assets
GROUP BY
ctcassettag
)
newest
ON newest.ctcassettag = ctc_pre_assets.ctcassettag
AND newest.createddate = ctc_pre_assets.createddate
ORDER BY
ctc_pre_assets.id
EDIT: To deal with multiple rows with the same date.
You haven't actually said how to pick which row you want in the event that multiple rows are for the same ctcassettag on the same createddate. So, this solution just chooses the row with the lowest id from amongst those duplicates.
SELECT
ctc_pre_assets.*
FROM
ctc_pre_assets
WHERE
ctc_pre_assets.id
=
(
SELECT
lookup.id
FROM
ctc_pre_assets lookup
WHERE
lookup.ctcassettag = ctc_pre_assets.ctcassettag
ORDER BY
lookup.createddate DESC,
lookup.id ASC
LIMIT
1
)
This does still use a correlated sub-query, which is slower than a simple nested-sub-query (such as my first answer), but it does deal with the "duplicates".
You can change the rules on which row to pick by changing the ORDER BY in the correlated sub-query.
It's also very similar to your own query, but with one less join.
Nested queries are always known to take longer time than a conventional query since. Can you append 'explain' at the start of the query and put your results here? That will help us analyse the exact query/table which is taking longer to response.
Check if the table has indexes. Unindented tables are not advisable(until unless obviously required to be unindented) and are alarmingly slow in executing queries.
On the contrary, I think the best case is to avoid writing nested queries altogether. Bette, run each of the queries separately and then use the results(in array or list format) in the second query.
First some questions that you should at least ask yourself, but maybe also give us an answer to improve the accuracy of our responses:
Is your data normalized? If yes, maybe you should make an exception to avoid this brutal subquery problem
Are you using indexes? If yes, which ones, and are you using them to the fullest?
Some suggestions to improve the readability and maybe performance of the query:
- Use joins
- Use group by
- Use aggregators
Example (untested, so might not work, but should give an impression):
SELECT t2.*
FROM (
SELECT id
FROM ctc_pre_assets
GROUP BY ctcassettag
HAVING createddate = max(createddate)
ORDER BY ctcassettag DESC
) ro
INNER JOIN ctc_pre_assets t2 ON t2.id = ro.lid
ORDER BY id
Using normalization is great, but there are a few caveats where normalization causes more harm than good. This seems like a situation like this, but without your tables infront of me, I can't tell for sure.
Using distinct the way you are doing, I can't help but get the feeling you might not get all relevant results - maybe someone else can confirm or deny this?
It's not that subqueries are all bad, but they tend to create massive scaleability issues if written incorrectly. Make sure you use them the right way (google it?)
Indexes can potentially save you for a bunch of time - if you actually use them. It's not enough to set them up, you have to create queries that actually uses your indexes. Google this as well.

MySQL Group By and Order by not playing nicely together

I have read a few post on this, but not seeming to be able to fix my problem.
I am calling two database queries to populate two array's that run along side by side of each other, but they aren't matching, as the order that they come out is different. I believe i have something to do with the Group By, and this may require a sub query, but again a little lost...
Query 1:
SELECT count(bids_bid.total_bid), bidtime_bid, users_usr.company_usr, users_usr.id_usr
FROM bids_bid
INNER JOIN users_usr
ON bids_bid.user_bid = users_usr.id_usr
WHERE auction_bid = 36
GROUP BY user_bid
ORDER BY bidtime_bid ASC
Query 2:
SELECT auction_bid, user_bid, bidtime_bid, bids_bid.total_bid
FROM bids_bid
WHERE auction_bid = 36
ORDER BY bidtime_bid ASC
Even though the 'Order by' is the same the results aren't matching. The users are coming out in a different sequence.
I hope this makes sense, and thanks in advance.
* Update *
I just wanted to add a bit of clarity on what the output I want is. I need to only show 1 result by one user (user_bid) the second query show all users rows. I only need the first one to show the first row entered for each user. So if I could order before the the group and by min date, that would be ace...
It's to be expected. You're fetching fields that are NOT involved in the grouping, and are not part of an aggregate function. MySQL allows such things, but generally the results of the ungrouped/unaggregated functions can be wonky.
Because MySQL is free to chose WHICH of the potentially multiple 'free' rows to choose for the actual result row, you will get different results. Generally it picks the first-encountered 'free choice' result, but that's not defined/guaranteed.
You use grouping when you want unique results in result set according to some
group id (column name). usually grouping is used with aggregate functions such as
(min, max,count,sum..).
Ordering or inner query is nothing to do with result set, i suggest read some introductory
tutorials about grouping and think/treat Sql as a set based language and most of the set theory is applied on sql you'll be fine.
So I was complicating issues that I didn't need to. The solution I found was before.
SELECT users_usr.company_usr,
users_usr.id_usr,
bids_bid.bidtime_bid, min(bidtime_bid) as minbid FROM bids_bid INNER JOIN users_usr ON bids_bid.user_bid = users_usr.id_usr
WHERE auction_bid = 36
GROUP BY id_usr
ORDER BY minbid ASC
Thanks everyone for making me look (try) harder...

mysql max query then JOIN?

I have followed the tutorial over at tizag for the MAX() mysql function and have written the query below, which does exactly what I need. The only trouble is I need to JOIN it to two more tables so I can work with all the rows I need.
$query = "SELECT idproducts, MAX(date) FROM results GROUP BY idproducts ORDER BY MAX(date) DESC";
I have this query below, which has the JOIN I need and works:
$query = ("SELECT *
FROM operators
JOIN products
ON operators.idoperators = products.idoperator JOIN results
ON products.idProducts = results.idproducts
ORDER BY drawndate DESC
LIMIT 20");
Could someone show me how to merge the top query with the JOIN element from my second query? I am new to php and mysql, this being my first adventure into a computer language I have read and tried real hard to get those two queries to work, but I am at a brick wall. I cannot work out how to add the JOIN element to the first query :(
Could some kind person take pity on a newb and help me?
Try this query.
SELECT
*
FROM
operators
JOIN products
ON operators.idoperators = products.idoperator
JOIN
(
SELECT
idproducts,
MAX(date)
FROM results
GROUP BY idproducts
) AS t
ON products.idproducts = t.idproducts
ORDER BY drawndate DESC
LIMIT 20
JOINs function somewhat independently of aggregation functions, they just change the intermediate result-set upon which the aggregate functions operate. I like to point to the way the MySQL documentation is written, which hints uses the term 'table_reference' in the SELECT syntax, and expands on what that means in JOIN syntax. Basically, any simple query which has a table specified can simply expand that table to a complete JOIN clause and the query will operate the same basic way, just with a modified intermediate result-set.
I say "intermediate result-set" to hint at the mindset which helped me understand JOINS and aggregation. Understanding the order in which MySQL builds your final result is critical to knowing how to reliably get the results you want. Generally, it starts by looking at the first row of the first table you specify after 'FROM', and decides if it might match by looking at 'WHERE' clauses. If it is not immediately discardable, it attempts to JOIN that row to the first JOIN specified, and repeats the "will this be discarded by WHERE?". This repeats for all JOINs, which either add rows to your results set, or remove them, or leaves just the one, as appropriate for your JOINs, WHEREs and data. This process builds what I am referring to when I say "intermediate result-set". Somewhere between starting and finishing your complete query, MySQL has in it's memory a potentially massive table-like structure of data which it built using the process I just described. Only then does it begin to aggregate (GROUP) the results according to your criteria.
So for your query, it depends on what specifically you are going for (not entirely clear in OP). If you simply want the MAX(date) from the second query, you can simply add that expression to the SELECT clause and then add an aggregation spec to the end:
SELECT *, MAX(date)
FROM operators
...
GROUP BY idproducts
ORDER BY ...
Alternatively, you can add the JOIN section of the second query to the first.

Complex MySQL Query is Slow

A program I've been working on uses a complex MySQL query to combine information from several tables that have matching item IDs. However, since I added the subqueries you see below, the query has gone from taking under 1 second to execute to over 3 seconds. Do you have any suggestions for what I might do to optimize this query to be faster? Am I wrong in my thinking that having one complex query is better than having 4 or 5 smaller queries?
SELECT uninet_articles.*,
Unix_timestamp(uninet_articles.gmt),
uninet_comments.commentcount,
uninet_comments.lastposter,
Unix_timestamp(uninet_comments.maxgmt)
FROM uninet_articles
RIGHT JOIN (SELECT aid,
(SELECT poster
FROM uninet_comments AS a
WHERE b.aid = a.aid
ORDER BY gmt DESC
LIMIT 1) AS lastposter,
Count(*) AS commentcount,
Max(gmt) AS maxgmt
FROM uninet_comments AS b
GROUP BY aid
ORDER BY maxgmt DESC
LIMIT 10) AS uninet_comments
ON uninet_articles.aid = uninet_comments.aid
LIMIT 10
Queries can be though of as going through the data to find what matches. Sub-queries require going through the data many times in order to find which items are needed. In this case, you probably want to rewrite it as multiple queries. Many times, multiple simpler queries will be better - I think this is one of those cases.
You can also look at if your indexes are working well - if you know what that is. The reason why has to do with this: How does database indexing work?.
For a specific suggestion, you can find the last poster for each AID in a different query, and simply join it afterwards.
It always depends on the data you have and the way you use it.
You should use explain on your selects to see if you are using the indexes or not.
http://dev.mysql.com/doc/refman/5.5/en/explain.html

MySQL - Select # of rows as well as other data at same time w/o considering LIMIT

So I have a MySQL query that is being created dynamically based on parameters passed to my php script. It which is working great, however, I would like to be able to fetch the number of rows it would return before applying the LIMIT statement. It seems to me like this could not be done in one query, but there is so much about MySQL I don't know that it would not surprise me in the slightest if you could.
I have looked around and found out that COUNT(*) is much faster than mysql_num_rows() but as far as I know, I can't use either in my case since they would simply be limited by how many results I have specified in my LIMIT, which is usually 30. Also, I would rather not submit a query that doesn't contain the LIMIT just to get the numbers or rows it would return and then immediately apply the LIMIT and submit the same query.
Here is a sample query:
SELECT DISTINCT w.id, w.resolution, w.views, w.colors, w.favorites
FROM wallpapers AS w
LEFT JOIN (tagmap AS tm, tags AS t)
ON (
w.id = tm.wallpaper_id
AND
tm.tag_id = t.tag_id
) WHERE w.ratio =
(SELECT ratio FROM wallpapers WHERE resolution = '1920 x 1080' LIMIT 1)
AND (
t.tag
IN (
'portal'
)
)
AND (w.verify = 2)
ORDER BY w.id DESC LIMIT 30
I've cleaned that up and spent a lot of time optimising it so hopefully it is easy for you to read but also feel free to mention anything else you spot funky in the code that may help speed it up ^^
Back to the question, one way I see to do this would be to stick the conditionals (everything but the SELECT, JOIN, GROUP BY, AND LIMIT basically) from that query onto a SELECT COUNT(*) statement, run that, and then run the actual query to get all the data. Is that the most efficient way or does somebody have a better idea? I would love to hear all possible solutions, thanks! :)
I think what you are looking for is CALC_FOUND_ROWS
SELECT SQL_CALC_FOUND_ROWS * FROM tbl_name
WHERE id > 100 LIMIT 10;
SELECT FOUND_ROWS();
It's still two queries though.