Setting a LIMIT before JOINs of tables - mysql

I have the following tables:
client_purchases:
id_sale | id_client | timestamp
files_purchases:
id_sale | id_file
So with one purchase of the client he can buy many files and the files can be bought several times.
I select what I want like this:
SELECT cp.id_sale, fp.id_file
FROM client_purchases AS cp
JOIN file_purchases AS fp
ON cp.id_sale = fp.id_sale;
Works just fine. What I get is something like this:
id_sale | id_file
1 1
1 2
1 3
2 1
3 1
Now to make sure that it doesn't take forever to look through my database if it grows I wanted to limit the amount of rows.
SELECT cp.id_sale, fp.id_file
FROM client_purchases AS cp
JOIN file_purchases AS fp
ON cp.id_sale = fp.id_sale
LIMIT 0,25;
Whick returns me 25 rows. But what I acctually want is 25 different "id_sale". So is there a method to tell SQL to count the DESTINCTvalues of a column and stop if that value reaches a specified number? And I do need to be able to set the start and end value of the LIMIT.

You can use JOIN + Subquery
SELECT cp.id_sale, fp.id_file
FROM (SELECT id, id_sale FROM client_purchases ORDER BY id LIMIT 25) AS cp
JOIN (SELECT id FROM file_purchases ORDER BY id LIMIT 25) AS fp
ON cp.id_sale = fp.id_sale
However this may speed up your query or it may make it go even slower. It all depends on what kinds of indexes you have and how many records you have in the table.
What seems fast with 100 records might be slow with 100M records and vice verce.

There is no feature in general. You can do limit the number of ids using a subquery:
SELECT cp.id_sale, fp.id_file
FROM (SELECT cp.*
FROM client_purchases cp
LIMIT 25
) cp JOIN
file_purchases fp
ON cp.id_sale = fp.id_sale ;
Normally, there would be an ORDER BY before the LIMIT so the query returns consistent results.
However, this is not a general solution, because the 25 ids chosen in client_purchases may not match anything in file_purchases (they may match in your case, but perhaps not in general).

Related

improve sql query with 2 EXISTS sub queries

I have this query (mysql):
SELECT `budget_items`.*
FROM `budget_items`
WHERE (budget_category_id = 4
AND ((is_custom_for_family = 0)
OR (is_custom_for_family = 1
AND custom_item_family_id = 999))
AND ((EXISTS
(SELECT 1
FROM balance_histories
WHERE balance_histories.budget_item_id = budget_items.id
AND balance_histories.family_id = 999
AND payment_date >= '2021-02-01'
AND payment_date <= '2021-02-28' ))
OR (EXISTS
(SELECT 1
FROM budget_lines
WHERE family_id = 999
AND budget_id = 188311
AND budget_item_id = budget_items.id
AND amount > 0))))
It runs multiple times on app start. It takes more than 10 seconds (all of them).
I have indexes on:
balance_histories table: budget_item_id, family_id (tried also payment_date)
budget_lines table: family_id, budget_id, budget_item_id
How can I improve the speed? Query or maybe mysql (8) configuration.
balance_histories table:
budget_lines table:
I would start this query in reverse of what you have. Assuming you COULD have years of data, but your EXISTS query is looking more specifically at a date-range, or specific budget lines, start there, it will probably be much smaller. Once you have DISTINCT IDs, then go back to the budget items by qualified ID PLUS the additional criteria.
To help optimize the queries, I would have indexes on
table index
balance_histories ( family_id, payment_date, budget_item_id )
budget_lines ( family_id, budget_id, amount )
budget_items ( id, budget_category_id, is_custom_for_family, custom_item_family_id )
select
bi.*
from
-- pre-query a list of DISTINCT IDs from the balance history
-- and budget lines that qualify. THEN join to the rest.
( select distinct
bh.budget_item_id id
from
balance_histories bh
where
bh.family_id = 999
AND bh.payment_date >= '2021-02-01'
AND bh.payment_date <= '2021-02-28'
UNION
select
bl.budget_item_id
FROM
budget_lines bl
WHERE
bl.family_id = 999
AND bl.budget_id = 188311
AND bl.amount > 0 ) PQ
JOIN budget_items bi
on PQ.id = bi.id
AND bi.budget_category_id = 4
AND ( bi.is_custom_for_family = 0
OR
( bi.is_custom_for_family = 1
AND bi.custom_item_family_id = 999 )
)
Feedback
As for many SQL queries, there are typically multiple ways to get a solution. Sometimes using EXISTS works well, sometimes not as much. You need to consider cardinality of your data, and that is what I was shooting for. Look at what you were asking for first: Get budget items that are all category for and custom for family is 1 or 0 (which is all), but if family, only those for 999. You were correct on your balance of AND/OR. However, this is going through EVERY RECORD, and if you have millions of rows, that is what you are scanning through. Only after scanning through every row are you now doing a secondary query (for each record that qualified) against the histories for the specific date range OR family/budget.
My guess is that the number of possible records returned from your two EXISTS queries was going to be very small. So, by starting by getting a DISTINCT list of just those IDs that are part of that union would be the very small subset. Once that single "ID" if found, it now becomes a direct match to the budget items table and have the final filtering limits of categoryID / Family / Custom Item considerations.
By having indexes better match the context of your query WHERE clause will optimize pulling data. I have had answers to several other questions with similar resolutions and clarify indexes and why in those... take a look for example, and another here.

Join not producing results required

I want to gather all the details from a table PROD about rows containing particular triplet-sets of values. For example, I want to get all the data on the rows having columns (ID, NBR AND COP_I) with values (23534, 99, 0232) and (3423,5,09384), etc.
I was wondering about a way to select the triplets rows via a Join, which may be better than the way I am doing it below as that currently does not work.
The following Query produces the required triplets, associated with the top 100 rows:
SELECT ID, NBR, COP_I, SUM(PAD_MN) AS PAD_MN_SUMMED
FROM PROD
WHERE
PROD.FLAG = 0
GROUP BY 1,2,3
ORDER BY 4 DESC, 3,2,1
LIMIT 100 --TOP 100 ROWS
I tried joining to the Query above as follows to get all the details corresponding to those top 100 row triplets:
SELECT PROD.ID, PROD.NBR,PROD.COP_I,PROD.FLAG,PROD.TYPE,PROD.DATE, PROD.PAD_MN
FROM ( SELECT ID, NBR, COP_I, SUM(PAD_MN) AS PAD_MN_SUMMED
FROM PROD
WHERE
PROD.FLAG = 0
GROUP BY 1,2,3
ORDER BY 4 DESC, 3,2,1
LIMIT 100) TAB2
INNER JOIN PROD
ON (PROD.ID = TAB2.ID
AND PROD.NBR = TAB2.NBR
AND PROD.COP_I = TAB2.COP_I)
However, the above query gives me rows not even associated with any of the triplets. I feel like I may be making a mistake with the Join, but I don't know why and how to rectify it. I get a similar issue when using the answer provided below
UPDATE
PROD Table containing 10,000+ rows looks something like:
ID NBR COP_I FLAG TYPE DATE PAD_MN
3423 5 09384 0 BA 14-06-2016 18657.43
546 1098 098 1 CFA 22-03-1998 2394566.92
3423 5 09384 0 AA 28-11-2013 3423534.12
23534 99 0232 0 BA 05-01-2016 7304567.12
Results Required, which is to contain only the top 100 rows information:
ID NBR COP_I FLAG TYPE DATE PAD_MN
23534 99 0232 0 BA 05-01-2016 17370567.09
3423 5 09384 0 AA 28-11-2013 6321009.98
However, the output from my query gives rows, which have triplets (ID,NBR,COP_I) which are not actually outputted from the first Query above that produces the required triplets.
If I correctly understand you this is what is you want
with join
select prod.* from (select id, nbr, cop_i, sum(pad_mn) as pad_mn_total from prod where prod.flag = 0 group by 1,2,3 order by 4 desc,3,2,1 limit 100) as top_prod left join prod using (id, nbr, cop_i);
without join
select prod.* from (select id, nbr, cop_i, sum(pad_mn) as pad_mn_total from prod where prod.flag = 0 group by 1,2,3 order by 4 desc,3,2,1 limit 100) as top_prod, prod where prod.id = top_prod.id and prod.nbr = top_prod.nbr and prod.cop_i = top_prod.cop_i;
Better way is to use join. Before using queries in production mode I strongly recommend to check explain response for understanding how data will be collected by mysql and how your indexes works for each query.
Here you can find some info about join http://dev.mysql.com/doc/refman/5.7/en/join.html
How to use explain described here http://dev.mysql.com/doc/refman/5.7/en/using-explain.html
BTW: Reading manuals is a good way to resolve problems
UPD: after some discussions in comments:
Q: Is there a way to prevent these "grouped" rows from being restored whilst still retrieving the other info required only for the 100 sorted rows?
A: select sum(pad_mn) as pad_mn_total, prod.* from prod where prod.flag = 0 group by id,nbr,cop_i order by 1 desc,cop_i,nbr,id limit 100

MySQL Query Sorting High Scores

I'm trying to select the top 3 entries from a table called games that has foreign keys to the players and 2 ints for individual scores for the host and opponent.
The Query:
SELECT
games.id, games.host_score, games.opponent_score, player.name
FROM games, player
WHERE player.id = games.host_player_id
|| player.id = games.opponent_player_id
ORDER BY games.host_score, games.opponent_score DESC LIMIT 3
The query completes but it comes back out of order:
id host_score opponent_score name
17 0 0 Temp2
17 0 0 Temp0
16 770 930 Temp0
When I run a query that doesn't have an OR it works. How can I get this method working?
Also is there a way to set a LIMIT of 50 but not count duplicates?
For example if i wanted a limit of 2 but 3 people have the score 50 and 2 people have the score 20 it would return :
id host_score opponent_score name
17 50 0 Temp2
17 50 0 Temp0
17 50 0 Temp1
17 20 0 Temp3
17 20 0 Temp4
Or would it be better to run it as seperate quesies in php?
If you want to order from highest to lowest, you need to specify it for each field
ORDER BY games.host_score DESC, games.opponent_score DESC
Because when you don't specify the order it assumes you want the ascending order
I think your query is wrong, because one game will be on two rows -- one for the host and one for the opponent.
You want to get both the host and opponent names, so you need to join twice to the player table:
SELECT g.id, g.host_score, g.opponent_score, hp.name as HostName, op.name as OpponentName
FROM games g join
player hp
on hp.id = g.host_player_id join
player op
on op.id = g.opponent_player_id
ORDER BY g.host_score, g.opponent_score DESC
LIMIT 3
You should use OR (instead of ||) in SQL, also add parenthesis to make it readable.
WHERE (player.id = games.host_player_id) OR (player.id = games.opponent_player_id)
To get top 50 scores by the score values, though the total row count returned may be higher.
Use the barebones query below and tweak it to your needs.
SELECT g1.id, g1.host_scores, COUNT(g2.host_scores) AS Rank
FROM games g1
WHERE Rank <= 50
JOIN games g2 ON (g1.host_scores < g2.host_scores) OR (g1.host_scores=g2.host_scores)
GROUP BY g1.id, g1.host_scores
ORDER BY g1.host_scores DESC;
I must add that for such things to avoid complexity you can also get the data to your
application and easily do this in a programming language like Java, PHP etc.
It may result in you making more than one query but is far more simpler and more
maintainable over time.

SQL join with with where and having count() condition

I have 2 tables
Sleep_sessions [id, user_id, (some other values)]
Tones [id, sleep_sessions.id (FK), (some other values)]
I need to select 10 sleep_sessions where user_id = 55 and where each sleep_session record has at least 2 tone records associated with it.
I currently have the following;
SELECT `sleep_sessions`.*
FROM (`sleep_sessions`)
JOIN `tones` ON sleep_sessions.id = `tones`.`sleep_session_id`
WHERE `user_id` = 55
GROUP BY `sleep_sessions`.`id`
HAVING count(tones.id) > 4
ORDER BY `started` desc
LIMIT 10
However I've noticed that count(tone.id) is basically the entire of the tones table and not the current sleep_session being joined
Many thanks for your help,
Andy
I'm not sure what went wrong with your query. Maybe, try
HAVING count(*)
The following query might be a bit more readable (having can be a bit of a pain to understand):
SELECT *
FROM (`sleep_sessions`)
WHERE `user_id` = 55
AND (SELECT count(*) FROM `tones`
WHERE `sleep_sessions`.`id` = `tones`.`sleep_session_id`) > 4
ORDER BY `started` desc
LIMIT 10
The advantage of this is the fact that you won't mess up the wrong semantics you have created between your GROUP BY and ORDER BY clauses. Only MySQL would ever accept your original query. Here's some insight:
http://dev.mysql.com/doc/refman/5.6/en/group-by-hidden-columns.html

Elegant mysql to select, group, combine multiple rows from one table

Here is a simplified version of my table:
group price spec
a 1 .
a 2 ..
b 1 ...
b 2
c .
. .
. .
I'd like to produce a result like this: (I'll refer to this as result_table)
price_a |spec_a |price_b |spec_b |price_c ...|total_cost
1 |. |1 |.. |... |
(min) (min) =1+1+...
Basically I want to:
select the rows containing the min price within each group
combine columns into a single row
I know this can be done using several queries and/or combined with some non-sql processing on the results, but I suspect that there maybe better solutions.
The reason that I want to do task 2 (combine columns into a single row)
is because I want to do something like the following with the result_table:
select *,
(result_table.total_cost + table1.price + table.2.price) as total_combined_cost
from result_table
right join table1
right join table2
This may be too much to ask for, so here is some other thoughts on the problem:
Instead of trying to combine multiple rows(task 2), store them in a temporary table
(which would be easier to calculate the total_cost using sum)
Feel free to drop any thoughts, don't have to be complete answer, I feel it's brilliant enough if you have an elegant way to do task 1 !
==Edited/Added 6 Feb 2012==
The goal of my program is to identify best combinations of items with minimal cost (and preferably possess higher utilitarian value at the same time).
Consider #ypercube's comment about large number of groups, temporary table seems to be the only feasible solution. And it is also pointed out there is no pivoting function in MySQL (although it can be implemented, it's not necessary to perform such operation).
Okay, after study #Johan's answer, I'm thinking about something like this for task 1:
select * from
(
select * from
result_table
order by price asc
) as ordered_table
group by group
;
Although looks dodgy, it seems to work.
==Edited/Added 7 Feb 2012==
Since there could be more than one combination may produce the same min value, I have modified my answer :
select result_table.* from
(
select * from
(
select * from
result_table
order by price asc
) as ordered_table
group by group
) as single_min_table
inner join result_table
on result_table.group = single_min_table.group
and result_table.price = single_min_table.price
;
However, I have just realised that there is another problem I need to deal with:
I can not ignore all the spec, since there is a provider property, items from different providers may or may not be able to be assembled together, so to be safe (and to simplify my problem) I decide to combine items from the same provider only, so the problem becomes:
For example if I have an initial table like this(with only 2 groups and 2 providers):
id group price spec provider
1 a 1 . x
2 a 2 .. y
3 a 3 ... y
4 b 1 ... y
5 b 2 x
6 b 3 z
I need to combine
id group price spec provider
1 a 1 . x
5 b 2 x
and
2 a 2 .. y
4 b 1 ... y
record (id 6) can be eliminated from the choices since it dose not have all the groups available.
So it's not necessarily to select only the min of each group, rather it's to select one from each group so that for each provider I have a minimal combined cost.
You cannot pivot in MySQL, but you can group results together.
The GROUP_CONCAT function will give you a result like this:
column A column B column c column d
groups specs prices sum(price)
a,b,c some,list,xyz 1,5,7 13
Here's a sample query:
(The query assumes you have a primary (or unique) key called id defined on the target table).
SELECT
GROUP_CONCAT(a.`group`) as groups
,GROUP_CONCAT(a.spec) as specs
,GROUP_CONCAT(a.min_price) as prices
,SUM(a.min_prices) as total_of_min_prices
FROM
( SELECT price, spec, `group` FROM table1
WHERE id IN
(SELECT MIN(id) as id FROM table1 GROUP BY `group` HAVING price = MIN(price))
) AS a
See: http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html
Producing the total_cost only:
SELECT SUM(min_price) AS total_cost
FROM
( SELECT MIN(price) AS min_price
FROM TableX
GROUP BY `group`
) AS grp
If a result set with the minimum prices returned in row (not in column) per group is fine, then your problem is of the gretaest-n-per-group type. There are various methods to solve it. Here's one:
SELECT tg.grp
tm.price AS min_price
tm.spec
FROM
( SELECT DISTINCT `group` AS grp
FROM TableX
) AS tg
JOIN
TableX AS tm
ON
tm.PK = --- the Primary Key of the table
( SELECT tmin.PK
FROM TableX AS tmin
WHERE tmin.`group` = tg.grp
ORDER BY tmin.price ASC
LIMIT 1
)