I'm working on a project involving words and its translations. One of the queries a translator must task frequently (once every 10 sec or so) is:
SELECT * FROM pwords p
LEFT JOIN words w ON p.id = w.wordid
WHERE w.code IS NULL
OR (w.code <> "USER1" AND w.code <> "USER2")
ORDER BY rand() LIMIT 10
To receive a word to be translated, which the user has not translated already. In this case we want to disallow words input by USER2
The pwords table has around 66k entries and the words table has around 55k entries.
This query takes about 500 seconds to complete, whereas if I remove the IS NULL the query takes 0.0245 ms. My question here is: is there a way to optimize this query? I really need to squeeze the numbers.
The scenario is: USER1 does not want any database entries from USER2 in the words table. It does not want it's own database entries from the same table. Therefore I need to have the IS NULL or a similar method to get entries from all users except USER1 and USER2, either from other users or NULL entries.
tl;dr So my question is: is there a way to make this query run faster? Is "IS NULL" optimizable?
Any and all help is greatly appreciated.
You can try using subquery in order to filter the rows (use WHERE statement) as soon as possible:
SELECT *
FROM pwords p
LEFT JOIN
(SELECT *
FROM words w
WHERE (w.code <> "USER1" AND w.code <> "USER2")) subq
ON p.id = subq.wordid
WHERE w.code IS NULL
ORDER BY rand() LIMIT 10
another (and maybe a more efficient option) is using NOT EXISTS statement:
SELECT *
FROM pwords p
WHERE NOT EXISTS
(SELECT *
FROM words w
WHERE p.id=w.wordid AND (w.code <> "USER1" AND w.code <> "USER2"))
ORDER BY rand() LIMIT 10
Related
I'm trying to find some records based on the tags they have. In order to find out which tags a record has I add them to the result using a subquery. To find out which results should be returned I added a having statement at the end of the query. But something tells me this is not the best way. Is there any other way to do it?
Here is the query that takes too much time to execute.
SELECT c.id,c.currentBalance,
(SELECT running_balance from vrCorporateLedger WHERE company_id=c.id
ORDER by id DESC LIMIT 1) AS ledgerBalance
FROM company AS c
WHERE c.vrCorporate='YES'
AND c.deleted_at IS NULL
HAVING currentBalance > ledgerBalance
This should do the trick:
SELECT c.id,c.currentBalance,
vrcl.running_balance ledgerBalance
FROM company AS c
INNER JOIN vrCorporateLedger vrcl
ON vrcl.id = c.company_id
WHERE c.vrCorporate='YES'
AND c.deleted_at IS NULL
AND currentBalance > vrcl.running_balance
I'm struggling to make a query efficient enough. I'm using Doctrine2 ORM (the query is build with QueryBuilder) and part of my query is running very slow - takes about 4s with table of 5000 rows.
This is the relevant part of db schema:
TABLE user
id (primary)
... (plenty of rows, not relevant to the query)
TABLE slot
id (primary)
user_id (foreign for user)
date (datetime)
And this is how my query looks like (it's the basic version, there's a lot of filters to be applied, but these work like fine for now)
SELECT
u.id AS uid,
COUNT(DISTINCT s_order.id) AS sclr_1,
COUNT(DISTINCT s_filter.id) AS sclr_2
FROM
user u
LEFT JOIN slot s_order ON (s_order.user_id = u.id)
LEFT JOIN slot s_filter ON (s_filter.user_id = u.id)
WHERE
(
(
(
s_order.date BETWEEN ?
AND ?
)
AND (
s_filter.date BETWEEN ?
AND ?
)
)
AND (u.deleted_at IS NULL)
)
AND u.userType IN ('2')
GROUP BY
u.id
HAVING
sclr_2 > 0
ORDER BY
sclr_1 DESC
LIMIT
12
Let me explain what I'm trying to achieve here:
I need to filter users who has any slots between 1 week ago and 1 week ahead, then order them by count of slots available between now and 1 week ahead. The part of query causing issues is LEFT JOIN of s_filter and I'm wondering whether perhaps there's a way to improve the performance of that query?
Any help appreciated really, even if it's only plain SQL I'll try to convert it to DQL myself!
#UPDATE
Just an additional info that I forgot, the LIMIT in query is for pagination purposes!
#UPDATE 2
After a while of tweaking the query I figured out that I can use JOIN for filtering instead of LEFT JOIN + COUNT, so my query does look like that now:
SELECT
u.id AS uid, COUNT(DISTINCT s_order.id) AS ordinal
FROM
langu_user u
LEFT JOIN
slot s_order ON (s_order.user_id = u.id) AND s_order.date BETWEEN '2017-02-03 14:03:22' AND '2017-02-10 14:03:22'
JOIN
slot s_filter ON (s_filter.user_id = u.id) AND s_filter.date BETWEEN '2017-01-27 14:03:22' AND '2017-02-10 14:03:22'
WHERE
u.deleted_at IS NULL
AND u.userType IN ('2')
GROUP BY u.id
ORDER BY ordinal DESC
LIMIT 12
And it went down from 4.1-4.3s to 3.6~
I have this query which retrives 10 ( $limited ) queries from MySQL ,
"SELECT content.loc,content.id,content.title,
voting_count.up,voting_count.down
FROM
content,voting_count
WHERE names.id = voting_count.unique_content_id
ORDER BY content.id DESC $limit"
This query did great for posts that were allready in database and had votes , however new posts won't show.
Vote row is "inserted" first time someone votes on post. I guess that the reason why they won't be listed as there is no unique_content_id to connect to.
If i change query into this :
"SELECT content.loc,content.id,content.title
FROM
content
ORDER BY content.id DESC $limit"
it works , but i can't access voting_count.up & voting_count.down rows.
How could i access both information in single query ? Is it doable ?
If some data might not exist in one of the tables, instead of using INNER JOIN you should use LEFT JOIN:
SELECT content.loc,content.id,content.title,
-- USE function COALSESCE will show 0 if there are no
-- related records in table voting_count
COALESCE(voting_count.up, 0) as votes_up,
COALSESCE(voting_count.down, 0) as voted_down
FROM content LEFT JOIN voting_count
ON content.id = voting_count.unique_content_id
ORDER BY content.id DESC
As someone else above mentioned, what is names.id? However, perhaps the following might be of use assuming the join should have been from content.id to voting_count.unique_content_id:
$sql="select
c.`loc`,c.`id`, c.`title`,
case
when v.`up` is null then
0
else
v.`up`
end as 'up',
case
when v.`down` is null then
0
else
v.`down`
end as 'down'
from `content` c
left outer join `voting_count` v on v.`unique_content_id`=c.`id`
order by c.`id` desc {$limit}";
I have two tables:
history
business
I want to run this query :
SELECT name, talias.*
FROM
(SELECT business.bussName as name history.*
FROM history
INNER JOIN business on history.bussID = business.bussID
WHERE history.activity = 'Insert' OR history.activity = 'Update'
UNION
SELECT name as Null, history.*
FROM history
WHERE history.activity = 'Delete'
) as talias
WHERE 1
order by talias.date DESC
LIMIT $fetch,20
this query take 13 second , I think the problem is that Mysql join all the rows at history and business tables ! While it should join just 20 rows !
how could I fix that ?
If I understand you correctly you want all rows from history where the activity is deleted plus all those rows where the activity is 'Insert' or 'Update' and there is a corresponding row in the business table.
I don't know if that is going to be faster than your query - you will need to check the execution plan to verify this.
SELECT *
FROM history
where activity = 'Delete'
or ( activity in ('Insert','Update')
AND exists (select 1
from business
where history.bussID = business.bussID))
order by `date` DESC
LIMIT $fetch,20
Edit (after the question has changed)
If you do need columns from the business table, replacing the union with an outer join might improve performance.
But to be honest, I don't expect it. The MySQL optimizer isn't very smart and I wouldn't be surprised if the outer join was actually implemented using some kind of union. Again only you can test that by looking at the execution plan.
SELECT h.*,
b.bussName as name
FROM history
LEFT JOIN business b
ON h.bussID = b.bussID
AND h.activity in ('Insert','Update')
WHERE h.activity in ('Delete', 'Insert','Update')
ORDER BY h.`date` DESC
LIMIT $fetch,20
Btw: date is a horrible column name. First because it's a reserved word, second (and more important) because it doesn't document anything. Is that the "creation date"? The "deletion date"? A "due date"? Some other date?
Try this:
SELECT h.*
FROM history AS h
WHERE (h.activity IN ('Insert', 'Update')
AND EXISTS (SELECT * FROM business AS b WHERE b.bussID = h.bussID))
OR h.activity = 'Delete'
ORDER BY h.date DESC
LIMIT $fetch, 20
For the ORDER BY and LIMIT to be efficient, make sure you have an index on history.date.
I'm using a query which generally executes in under a second, but sometimes takes between 10-40 seconds to finish. I'm actually not totally clear on how the subquery works, I just know that it works, in that it gives me 15 rows for each faverprofileid.
I'm logging slow queries and it's telling me 5823244 rows were examined, which is odd because there aren't anywhere close to that many rows in any of the tables involved (the favorites table has the most at 50,000 rows).
Can anyone offer me some pointers? Is it an issue with the subquery and needing to use filesort?
EDIT: Running explain shows that the users table is not using an index (even though id is the primary key). Under extra it says: Using temporary; Using filesort.
SELECT F.id,F.created,U.username,U.fullname,U.id,I.*
FROM favorites AS F
INNER JOIN users AS U ON F.faver_profile_id = U.id
INNER JOIN items AS I ON F.notice_id = I.id
WHERE faver_profile_id IN (360,379,95,315,278,1)
AND F.removed = 0
AND I.removed = 0
AND F.collection_id is null
AND I.nudity = 0
AND (SELECT COUNT(*) FROM favorites WHERE faver_profile_id = F.faver_profile_id
AND created > F.created AND removed = 0 AND collection_id is null) < 15
ORDER BY F.faver_profile_id, F.created DESC;
The number of rows examined represents is large because many rows have been examined more than once. You are getting this because of an incorrectly optimized query plan which results in table scans when index lookups should have been performed. In this case the number of rows examined is exponential, i.e. of an order of magnitude comparable to the product of the total number of rows in more than one table.
Make sure that you have run ANALYZE TABLE on your three tables.
Read on how to avoid table scans, and identify then create any missing indexes
Rerun ANALYZE and re-explain your queries
the number of examined rows must drop dramatically
if not, post the full explain plan
use query hints to force the use of indices (to see the index names for a table, use SHOW INDEX):
SELECT
F.id,F.created,U.username,U.fullname,U.id,I.*
FROM favorites AS F FORCE INDEX (faver_profile_id_key)
INNER JOIN users AS U FORCE INDEX FOR JOIN (PRIMARY) ON F.faver_profile_id = U.id
INNER JOIN items AS I FORCE INDEX FOR JOIN (PRIMARY) ON F.notice_id = I.id
WHERE faver_profile_id IN (360,379,95,315,278,1)
AND F.removed = 0
AND I.removed = 0
AND F.collection_id is null
AND I.nudity = 0
AND (SELECT COUNT(*) FROM favorites FORCE INDEX (faver_profile_id_key) WHERE faver_profile_id = F.faver_profile_id
AND created > F.created AND removed = 0 AND collection_id is null) < 15
ORDER BY F.faver_profile_id, F.created DESC;
You may also change your query to use GROUP BY faver_profile_id/HAVING count > 15 instead of the nested SELECT COUNT(*) subquery, as suggested by vartec. The performance of both your original and vartec's query should be comparable if both are properly optimized e.g. using hints (your query would use nested index lookups, whereas vartec's query would use a hash-based strategy.)
I think with GROUP BY and HAVING it should be faster.
Is that what you want?
SELECT F.id,F.created,U.username,U.fullname,U.id, I.field1, I.field2, count(*) as CNT
FROM favorites AS F
INNER JOIN users AS U ON F.faver_profile_id = U.id
INNER JOIN items AS I ON F.notice_id = I.id
WHERE faver_profile_id IN (360,379,95,315,278,1)
AND F.removed = 0
AND I.removed = 0
AND F.collection_id is null
AND I.nudity = 0
GROUP BY F.id,F.created,U.username,U.fullname,U.id,I.field1, I.field2
HAVING CNT < 15
ORDER BY F.faver_profile_id, F.created DESC;
Don't know which fields from items you need, so I've put placeholders.
I suggest you use Mysql Explain Query to see how your mysql server handles the query. My bet is your indexes aren't optimal, but explain should do much better than my bet.
You could do a loop on each id and use limit instead of the count(*) subquery:
foreach $id in [123,456,789]:
SELECT
F.id,
F.created,
U.username,
U.fullname,
U.id,
I.*
FROM
favorites AS F INNER JOIN
users AS U ON F.faver_profile_id = U.id INNER JOIN
items AS I ON F.notice_id = I.id
WHERE
F.faver_profile_id = {$id} AND
I.removed = 0 AND
I.nudity = 0 AND
F.removed = 0 AND
F.collection_id is null
ORDER BY
F.faver_profile_id,
F.created DESC
LIMIT
15;
I'll suppose the result of that query is intented to be shown as a paged list. In that case, perhaps you could consider to do a simpler "unjoined query" and do a second query for each row to read only the 15, 20 or 30 elements shown. Was not a JOIN a heavy operation? This would simplify the query and It wouldn't become slower when the joined tables grow.
Tell me if I'm wrong, please.