My table is reasonably small around 50,000 rows. My schema is as follows:
DAILY
match_id
user_id
result
round
tournament_id
Query:
SELECT user_id
FROM `daily`
WHERE user_id IN (SELECT user_id
FROM daily
WHERE round > 25
AND tournament_id = 24
AND (result = 'Won' OR result = 'Lost'))
Using the in keyword in the fashion you are is a very dangerous [from a performance perspective] thing to do. It will result in the sub query [(select user_id from daily where round > 25 and tournament_id=24 and (result='Won' or result='Lost'))] being ran 50,000 times in this case.
You'll want to convert this onto a join something to the effect of
select user_id from daily a join
(select user_id from daily where round > 25 and tournament_id=24 and (result='Won' or result='Lost')) b on a.user_id = b.user_id
Doing something similar to this will result in only two queries and a join.
As Cybernate pointed out in your specific example you can simply use where clauses, but I went ahead and suggested this in case your query is actually more complex than what you posted.
First verify and add Indexes as suggested earlier.
Also why are you using an in if you are querying data from same table.
Change your query to:
SELECT user_id
FROM daily
WHERE round > 25
AND tournament_id = 24
AND ( result = 'Won'
OR result = 'Lost' )
Your query only needs to be:
SELECT d.user_id
FROM DAILY d
WHERE d.round > 25
AND d.tournament_id = 24
AND d.result IN ('Won', 'Lost')
Indexes should be considered on:
DAILY.round
DAILY.tournament_id
DAILY.result
This should return in a millisecond.
SELECT user_id FROM daily WITH(NOLOCK)
where user_id in (select user_id from daily WITH(NOLOCK) where round > 25 and tournament_id = 24 and (result = 'Won' or result = 'Lost'))
Then make sure there is an index on the filter columns.
CREATE NONCLUSTERED INDEX IX_1 ON daily (round ASC, tournament_id ASC, result ASC)
Related
I have this query (mysql):
SELECT `budget_items`.*
FROM `budget_items`
WHERE (budget_category_id = 4
AND ((is_custom_for_family = 0)
OR (is_custom_for_family = 1
AND custom_item_family_id = 999))
AND ((EXISTS
(SELECT 1
FROM balance_histories
WHERE balance_histories.budget_item_id = budget_items.id
AND balance_histories.family_id = 999
AND payment_date >= '2021-02-01'
AND payment_date <= '2021-02-28' ))
OR (EXISTS
(SELECT 1
FROM budget_lines
WHERE family_id = 999
AND budget_id = 188311
AND budget_item_id = budget_items.id
AND amount > 0))))
It runs multiple times on app start. It takes more than 10 seconds (all of them).
I have indexes on:
balance_histories table: budget_item_id, family_id (tried also payment_date)
budget_lines table: family_id, budget_id, budget_item_id
How can I improve the speed? Query or maybe mysql (8) configuration.
balance_histories table:
budget_lines table:
I would start this query in reverse of what you have. Assuming you COULD have years of data, but your EXISTS query is looking more specifically at a date-range, or specific budget lines, start there, it will probably be much smaller. Once you have DISTINCT IDs, then go back to the budget items by qualified ID PLUS the additional criteria.
To help optimize the queries, I would have indexes on
table index
balance_histories ( family_id, payment_date, budget_item_id )
budget_lines ( family_id, budget_id, amount )
budget_items ( id, budget_category_id, is_custom_for_family, custom_item_family_id )
select
bi.*
from
-- pre-query a list of DISTINCT IDs from the balance history
-- and budget lines that qualify. THEN join to the rest.
( select distinct
bh.budget_item_id id
from
balance_histories bh
where
bh.family_id = 999
AND bh.payment_date >= '2021-02-01'
AND bh.payment_date <= '2021-02-28'
UNION
select
bl.budget_item_id
FROM
budget_lines bl
WHERE
bl.family_id = 999
AND bl.budget_id = 188311
AND bl.amount > 0 ) PQ
JOIN budget_items bi
on PQ.id = bi.id
AND bi.budget_category_id = 4
AND ( bi.is_custom_for_family = 0
OR
( bi.is_custom_for_family = 1
AND bi.custom_item_family_id = 999 )
)
Feedback
As for many SQL queries, there are typically multiple ways to get a solution. Sometimes using EXISTS works well, sometimes not as much. You need to consider cardinality of your data, and that is what I was shooting for. Look at what you were asking for first: Get budget items that are all category for and custom for family is 1 or 0 (which is all), but if family, only those for 999. You were correct on your balance of AND/OR. However, this is going through EVERY RECORD, and if you have millions of rows, that is what you are scanning through. Only after scanning through every row are you now doing a secondary query (for each record that qualified) against the histories for the specific date range OR family/budget.
My guess is that the number of possible records returned from your two EXISTS queries was going to be very small. So, by starting by getting a DISTINCT list of just those IDs that are part of that union would be the very small subset. Once that single "ID" if found, it now becomes a direct match to the budget items table and have the final filtering limits of categoryID / Family / Custom Item considerations.
By having indexes better match the context of your query WHERE clause will optimize pulling data. I have had answers to several other questions with similar resolutions and clarify indexes and why in those... take a look for example, and another here.
I have 2 tables that look like the following:
TABLE 1 TABLE 2
user_id | date accountID | date | hours
And I'm trying to add up the hours by the week. If I use the following statement I get the correct results:
SELECT
SUM(hours) as totalHours
FROM
hours
WHERE
accountID = 244
AND
date >= '2014-02-02' and date < '2014-02-09'
GROUP BY
accountID
But when I join the two tables I get a number like 336640 when it should be 12
SELECT
SUM(hours) as totalHours
FROM
hours
JOIN table1 ON
user_id = accountID
WHERE
accountID = 244
AND
date >= '2014-02-02' and date < '2014-02-09'
GROUP BY
accountID
Does anyone know why this is?
EDIT: Turns out I just needed to add DISTINC, thanks!
JOIN operations usually generate more rows in the result table: join's result is a row for every possible pair of rows in the two joined tables that happens to meet the criterion selected in the ON clause. If there are multiple rows in table1 that match each row in hours, the result of your join will repeat hours.accountID and hours.hours many times. So, adding up the hours yields a high result.
The reason is that the table you are joining to matches multiple rows in the first table. These all get added together.
The solution is to do the aggregation in a subquery before doing the join:
select totalhours
from (SELECT SUM(hours) as totalHours
FROM hours
WHERE accountID = 244 AND
date >= '2014-02-02' and date < '2014-02-09'
GROUP BY accountID
) h join
table1 t1
on t1.user_id = h.accountID;
I suspect your actual query is more complicated. For instance, table1 is not referenced in this query so the join is only doing filtering/duplication of rows. And the aggregation on hours is irrelevant when you are choosing only one account.
You should probably be specifying LEFT JOIN to be sure that it won't eliminate rows that don't match.
Also, date BETWEEN ? AND ? is preferable to date >= ? AND date < ?.
I would like to get the average or at least the sum of 200,000 rows from mySQL database. This is how I am querying the database but the amount is too large for me to query because I cannot afford to overload the server.
SELECT user_id, total_email FROM email_users
WHERE email_code = 1
LIMIT 200000
SELECT SUM(total_email), AVG(total_email) FROM email_users
WHERE user_id IN
(
01, 02,..., 200000-th user_id
)
My question is there a way to somehow combine the two queries into one so that I can get just the sum or average of 200,000 email_users which has email_code = 1.
EDIT: Thanks to all that have answered. I didn't realise the answer was so easy - nested select statement.
You can do this with a subquery:
SELECT SUM(total_email), AVG(total_email)
from (SELECT eu.*
FROM email_users eu
WHERE eu.email_code = 1
LIMIT 200000
) eu
Some notes. First, using limit without an order by gives indeterminate results. You could (in theory) run this query twice and get different results. Second, this assumes that there is a field called total_email in email_users.
SELECT SUM(total_email), AVG(total_email)
FROM (SELECT total_email
FROM email_users
WHERE email_code = 1
LIMIT 200000) x
How about something like this assuming you just want any 200K records from the DB where email_code=1
SELECT SUM(total_email), AVG(total_email) FROM email_users
WHERE user_id IN
(
SELECT user_id
FROM email_users
WHERE email_code = 1 LIMIT 200000
)
or
SELECT SUM(total_email), AVG(total_email) FROM
(SELECT user_id , total_email
FROM email_users
WHERE email_code = 1 LIMIT 200000)
Table temporary_search_table
post_id,property_status, property_address,....more 30 field
Table search_meta
meta_id,search_id,status,created_date
Ok I need Total data which created_date is yesterday. For each temporary_search_table data there may multiple entry within search_meta. So we need to pick last one field from search_meta and check created date is yesterday and property_status is pending. if yes then we can count the number. If there is no data available in search_meta for entry in temporary_search_table then we dont need to count that row within our results.
Here i am attaching my sql data. its work but for 30000 row it take lots of time.
SELECT COUNT(id) FROM temporary_search_table
WHERE property_status = 'pending' AND (1 = (SELECT DATEDIFF(NOW(), created_date)
FROM search_meta WHERE post_id = search_id ORDER BY created_date DESC LIMIT 0,1 ))
Thanks in advance.
Apart from checking the indexes on your table, it would probably be better to not use a correlated sub query and use a straight join instead.
SELECT COUNT(id)
FROM temporary_search_table
INNER JOIN search_meta ON post_id = search_id
WHERE property_status = 'pending' AND DATEDIFF(NOW(), created_date) = 1
ORDER BY created_date DESC
LIMIT 1
I have 2 tables
Sleep_sessions [id, user_id, (some other values)]
Tones [id, sleep_sessions.id (FK), (some other values)]
I need to select 10 sleep_sessions where user_id = 55 and where each sleep_session record has at least 2 tone records associated with it.
I currently have the following;
SELECT `sleep_sessions`.*
FROM (`sleep_sessions`)
JOIN `tones` ON sleep_sessions.id = `tones`.`sleep_session_id`
WHERE `user_id` = 55
GROUP BY `sleep_sessions`.`id`
HAVING count(tones.id) > 4
ORDER BY `started` desc
LIMIT 10
However I've noticed that count(tone.id) is basically the entire of the tones table and not the current sleep_session being joined
Many thanks for your help,
Andy
I'm not sure what went wrong with your query. Maybe, try
HAVING count(*)
The following query might be a bit more readable (having can be a bit of a pain to understand):
SELECT *
FROM (`sleep_sessions`)
WHERE `user_id` = 55
AND (SELECT count(*) FROM `tones`
WHERE `sleep_sessions`.`id` = `tones`.`sleep_session_id`) > 4
ORDER BY `started` desc
LIMIT 10
The advantage of this is the fact that you won't mess up the wrong semantics you have created between your GROUP BY and ORDER BY clauses. Only MySQL would ever accept your original query. Here's some insight:
http://dev.mysql.com/doc/refman/5.6/en/group-by-hidden-columns.html