Join not producing results required - mysql

I want to gather all the details from a table PROD about rows containing particular triplet-sets of values. For example, I want to get all the data on the rows having columns (ID, NBR AND COP_I) with values (23534, 99, 0232) and (3423,5,09384), etc.
I was wondering about a way to select the triplets rows via a Join, which may be better than the way I am doing it below as that currently does not work.
The following Query produces the required triplets, associated with the top 100 rows:
SELECT ID, NBR, COP_I, SUM(PAD_MN) AS PAD_MN_SUMMED
FROM PROD
WHERE
PROD.FLAG = 0
GROUP BY 1,2,3
ORDER BY 4 DESC, 3,2,1
LIMIT 100 --TOP 100 ROWS
I tried joining to the Query above as follows to get all the details corresponding to those top 100 row triplets:
SELECT PROD.ID, PROD.NBR,PROD.COP_I,PROD.FLAG,PROD.TYPE,PROD.DATE, PROD.PAD_MN
FROM ( SELECT ID, NBR, COP_I, SUM(PAD_MN) AS PAD_MN_SUMMED
FROM PROD
WHERE
PROD.FLAG = 0
GROUP BY 1,2,3
ORDER BY 4 DESC, 3,2,1
LIMIT 100) TAB2
INNER JOIN PROD
ON (PROD.ID = TAB2.ID
AND PROD.NBR = TAB2.NBR
AND PROD.COP_I = TAB2.COP_I)
However, the above query gives me rows not even associated with any of the triplets. I feel like I may be making a mistake with the Join, but I don't know why and how to rectify it. I get a similar issue when using the answer provided below
UPDATE
PROD Table containing 10,000+ rows looks something like:
ID NBR COP_I FLAG TYPE DATE PAD_MN
3423 5 09384 0 BA 14-06-2016 18657.43
546 1098 098 1 CFA 22-03-1998 2394566.92
3423 5 09384 0 AA 28-11-2013 3423534.12
23534 99 0232 0 BA 05-01-2016 7304567.12
Results Required, which is to contain only the top 100 rows information:
ID NBR COP_I FLAG TYPE DATE PAD_MN
23534 99 0232 0 BA 05-01-2016 17370567.09
3423 5 09384 0 AA 28-11-2013 6321009.98
However, the output from my query gives rows, which have triplets (ID,NBR,COP_I) which are not actually outputted from the first Query above that produces the required triplets.

If I correctly understand you this is what is you want
with join
select prod.* from (select id, nbr, cop_i, sum(pad_mn) as pad_mn_total from prod where prod.flag = 0 group by 1,2,3 order by 4 desc,3,2,1 limit 100) as top_prod left join prod using (id, nbr, cop_i);
without join
select prod.* from (select id, nbr, cop_i, sum(pad_mn) as pad_mn_total from prod where prod.flag = 0 group by 1,2,3 order by 4 desc,3,2,1 limit 100) as top_prod, prod where prod.id = top_prod.id and prod.nbr = top_prod.nbr and prod.cop_i = top_prod.cop_i;
Better way is to use join. Before using queries in production mode I strongly recommend to check explain response for understanding how data will be collected by mysql and how your indexes works for each query.
Here you can find some info about join http://dev.mysql.com/doc/refman/5.7/en/join.html
How to use explain described here http://dev.mysql.com/doc/refman/5.7/en/using-explain.html
BTW: Reading manuals is a good way to resolve problems
UPD: after some discussions in comments:
Q: Is there a way to prevent these "grouped" rows from being restored whilst still retrieving the other info required only for the 100 sorted rows?
A: select sum(pad_mn) as pad_mn_total, prod.* from prod where prod.flag = 0 group by id,nbr,cop_i order by 1 desc,cop_i,nbr,id limit 100

Related

Possible to count number of occurrences in a "group" in MySQL?

Sorry if the title is misleading, I don't really know the terminology for what I want to accomplish. But let's consider this table:
CREATE TABLE entries (
id INT NOT NULL,
number INT NOT NULL
);
Let's say it contains four numbers associated with each id, like this:
id number
1 0
1 9
1 17
1 11
2 5
2 8
2 9
2 0
.
.
.
Is it possible, with a SQL-query only, to count the numbers of matches for any two given numbers (tuples) associated with a id?
Let's say I want to count the number of occurrences of number 0 and 9 that is associated with a unique id. In the sample data above 0 and 9 does occur two times (one time where id=1 and one time where id=2). I can't think of how to write a SQL-query that solves this. Is it possible? Maybe my table structure is wrong, but that's how my data is organized right now.
I have tried sub-queries, unions, joins and everything else, but haven't found a way yet.
You can use GROUP BY and HAVING clauses:
SELECT COUNT(s.id)
FROM(
SELECT t.id
FROM YourTable t
WHERE t.number in(0,9)
GROUP BY t.id
HAVING COUNT(distinct t.number) = 2) s
Or with EXISTS():
SELECT COUNT(distinct t.id)
FROM YourTable t
WHERE EXISTS(SELECT 1 FROM YourTable s
WHERE t.id = s.id and s.id IN(0,9)
HAVING COUNT(distinct s.number) = 2)

SQL nested query alternative

SQL newbie here. I have a table where I have OrderID and State of the order.
OrderID, State, TimeStamp
1 0 20210502151515
1 1 20210502161616
1 2 20210502171717
2 0 20210502151617
2 1 20210502161718
2 3 20210502171819
3 0 20210502121617
3 4 20210502121718
4 0 20210502131617
5 0 20210502141718
6 0 20210502151515
6 2 20210502171717
7 0 20210502151515
7 1 20210502171717
Where 0 = OPEN, 1=Partially Completed, 2=Fully Completed, 3=Cancelled, 4=Rejected
I want to run a query where it would return orders that are OPEN (state=0) or Partially Completed (state=1). If the order is Fully completed, Cancelled or Rejected, I want to exclude those orders.
If I run to select orders with state 0,1 then it would return some orders that are fully done or cancelled or rejected. I need to run query where order states anything but 0 or 1.
I have this query which works but I am wondering if there is a better way to do it.
SELECT *
FROM myTable
WHERE OrderID NOT IN (select OrderId from myTable where state not in (0, 1))
Thank you!
If you just want orders, you can use aggregation:
select orderid
from mytable
group by orderid
having max(state) = 1;
If you want the details of the rows, you can use join, in or exists along with this query.
There is a better way, but not with sql. Maybe you want to create another table to store the current state of the order. It is much easier to get what you want.
Old-fashioned sql you would easily solve this with a correlated sub-query:
Select * from Mytable a
Where a.Timestamp=(Select max(Timestamp) from Mytable b
Where a.OrderId=b.OrderID)
and state<2
This selects only the most recent record by order (max(Timestamp)) and further only keeps it if that most recent record is 0 or 1.
Might something like this work or would it be end up being too brutal as the recordset grows?
select Mytable.orderid, Mytable.State, Mytable.TimeStamp
from Mytable
inner join
(
select orderid, max(Timestamp) newesttimestamp
from Mytable
group by orderid
) newestorderdetails
on Mytable.orderid = newestorderdetails.orderid and Mytable.Timestamp = newestorderdetails.newesttimestamp
where Mytable.state IN (0, 1)
order by Mytable.orderid, Mytable.state

Setting a LIMIT before JOINs of tables

I have the following tables:
client_purchases:
id_sale | id_client | timestamp
files_purchases:
id_sale | id_file
So with one purchase of the client he can buy many files and the files can be bought several times.
I select what I want like this:
SELECT cp.id_sale, fp.id_file
FROM client_purchases AS cp
JOIN file_purchases AS fp
ON cp.id_sale = fp.id_sale;
Works just fine. What I get is something like this:
id_sale | id_file
1 1
1 2
1 3
2 1
3 1
Now to make sure that it doesn't take forever to look through my database if it grows I wanted to limit the amount of rows.
SELECT cp.id_sale, fp.id_file
FROM client_purchases AS cp
JOIN file_purchases AS fp
ON cp.id_sale = fp.id_sale
LIMIT 0,25;
Whick returns me 25 rows. But what I acctually want is 25 different "id_sale". So is there a method to tell SQL to count the DESTINCTvalues of a column and stop if that value reaches a specified number? And I do need to be able to set the start and end value of the LIMIT.
You can use JOIN + Subquery
SELECT cp.id_sale, fp.id_file
FROM (SELECT id, id_sale FROM client_purchases ORDER BY id LIMIT 25) AS cp
JOIN (SELECT id FROM file_purchases ORDER BY id LIMIT 25) AS fp
ON cp.id_sale = fp.id_sale
However this may speed up your query or it may make it go even slower. It all depends on what kinds of indexes you have and how many records you have in the table.
What seems fast with 100 records might be slow with 100M records and vice verce.
There is no feature in general. You can do limit the number of ids using a subquery:
SELECT cp.id_sale, fp.id_file
FROM (SELECT cp.*
FROM client_purchases cp
LIMIT 25
) cp JOIN
file_purchases fp
ON cp.id_sale = fp.id_sale ;
Normally, there would be an ORDER BY before the LIMIT so the query returns consistent results.
However, this is not a general solution, because the 25 ids chosen in client_purchases may not match anything in file_purchases (they may match in your case, but perhaps not in general).

Adding Row Values when there are no results - MySQL

Problem Statement: I need my result set to include records that would not naturally return because they are NULL.
I'm going to put some simplified code here since my code seems to be too long.
Table Scores has Company_type, Company, Score, Project_ID
Select Score, Count(Project_ID)
FROM Scores
WHERE company_type= :company_type
GROUP BY Score
Results in the following:
Score Projects
5 95
4 94
3 215
2 51
1 155
Everything is working fine until I apply a condition to company_type that does not include results in one of the 5 score categories. When this happens, I don't have 5 rows in my result set any more.
It displays like this:
Score Projects
5 5
3 6
1 3
I'd like it to display like this:
Score Projects
5 5
4 0
3 6
2 0
1 3
I need the results to always display 5 rows. (Scores = 1-5)
I tried one of the approaches below by Spencer7593. My simplified query now looks like this:
SELECT i.score AS Score, IFNULL(count(*), 0) AS Projects
FROM (SELECT 5 AS score
UNION ALL
SELECT 4
UNION ALL
SELECT 3
UNION ALL
SELECT 2
UNION ALL
SELECT 1) i
LEFT JOIN Scores ON Scores.score = i.score
GROUP BY Score
ORDER BY i.score DESC
And gives the following results, which is accurate except that the rows with 1 in Projects should actually be 0 because they are derived by the "i". There are no projects with a score of 5 or 2.
Score Projects
5 1
4 5
3 6
2 1
1 3
Solved! I just needed to adjust my count to specifically look at the project count - count(project) rather than count(*). This returned the expected results.
If you always want your query to return 5 rows, with Score values of 5,4,3,2,1... you'll need a rowsource that supplies those Score values.
One approach would be to use a simple query to return those fixed values, e.g.
SELECT 5 AS score
UNION ALL SELECT 4
UNION ALL SELECT 3
UNION ALL SELECT 2
UNION ALL SELECT 1
Then use that query as inline view, and do an outer join operation to the results from your current query
SELECT i.score AS `Score`
, IFNULL(q.projects,0) AS `Projects`
FROM ( SELECT 5 AS score
UNION ALL SELECT 4
UNION ALL SELECT 3
UNION ALL SELECT 2
UNION ALL SELECT 1
) i
LEFT
JOIN (
-- the current query with "missing" Score rows goes here
-- for completeness of this example, without the query
-- we emulate that result with a different query
SELECT 5 AS score, 95 AS projects
UNION ALL SELECT 3, 215
UNION ALL SELECT 1, 155
) q
ON q.score = i.score
ORDER BY i.score DESC
It doesn't have to be the view query in this example. But there does need to be a rowsource that the rows can be returned from. You could, for example, have a simple table that contains those five rows, with those five score values.
This is just an example approach for the general approach. It might be possible to modify your existing query to return the rows you want. But without seeing the query, the schema, and example data, we can't tell.
FOLLOWUP
Based on the edit to the question, showing an example of the current query.
If we are guaranteed that the five values of Score will always appear in the Scores table, we could do conditional aggregation, writing a query like this:
SELECT s.score
, COUNT(IF(s.company_type = :company_type,s.project_id,NULL)) AS projects
FROM Scores s
GROUP BY s.score
ORDER BY s.score DESC
Note that this will require a scan of all the rows, so it may not perform as well. The "trick" is the IF function, which returns a NULL value in place of project_id, when the row would have been excluded by the WHERE clause.)
If we are guaranteed that project_id is non-NULL, we could use a more terse MySQL shorthand expression to achieve an equivalent result...
, IFNULL(SUM(s.company_type = :company_type),0) AS projects
This works because MySQL returns 1 when the comparison is TRUE, and otherwisee returns 0 or NULL.
Try something like this:
select distinct score
from (
select distinct score from scores
) s
left outer join (
Select Score, Count(Project_ID) cnt
FROM Scores
WHERE company_type= :company_type
) x
on s.score = x.score
Your posted query would not work without a group by statement. However, even there, if you don't have those particular scores for that company type, it wouldn't work either.
One option is to use an outer join. That would require a little more work though.
Here's another option using conditional aggregation:
select Score, sum(company_type=:company_type)
from Scores
group by Score

MySQL : Group By Clause Not Using Index when used with Case

Im using MySQL
I cant change the DB structure, so thats not an option sadly
THE ISSUE:
When i use GROUP BY with CASE (as need in my situation), MYSQL uses
file_sort and the delay is humongous (approx 2-3minutes):
http://sqlfiddle.com/#!9/f97d8/11/0
But when i dont use CASE just GROUP BY group_id , MYSQL easily uses
index and result is fast:
http://sqlfiddle.com/#!9/f97d8/12/0
Scenerio: DETAILED
Table msgs, containing records of sent messages, with fields:
id,
user_id, (the guy who sent the message)
type, (0=> means it's group msg. All the msgs sent under this are marked by group_id. So lets say group_id = 5 sent 5 msgs, the table will have 5 records with group_id =5 and type=0. For type>0, the group_id will be NULL, coz all other types have no group_id as they are individual msgs sent to single recipient)
group_id (if type=0, will contain group_id, else NULL)
Table contains approx 10 million records for user id 50001 and with different types (i.e group as well as individual msgs)
Now the QUERY:
SELECT
msgs.*
FROM
msgs
INNER JOIN accounts
ON (
msgs.user_id = accounts.id
)
WHERE 1
AND msgs.user_id IN (50111)
AND msgs.type IN (0, 1, 5, 7)
GROUP BY CASE `msgs`.`type` WHEN 0 THEN `msgs`.`group_id` ELSE `msgs`.`id` END
ORDER BY `msgs`.`group_id` DESC
LIMIT 100
I HAVE to get summary in a single QUERY,
so msgs sent to group lets say 5 (have 5 records in this table) will be shown as 1 record for summary (i may show COUNT later, but thats not an issue).
The individual msgs have NULL as group_id, so i cant just put 'GROUP BY group_id ' coz that will Group all individual msgs to single record which is not acceptable.
Sample output can be something like:
id owner_id, type group_id COUNT
1 50001 0 2 5
1 50001 1 NULL 1
1 50001 4 NULL 1
1 50001 0 7 5
1 50001 5 NULL 1
1 50001 5 NULL 1
1 50001 5 NULL 1
1 50001 0 10 5
Now the problem is that the GROUP condition after using CASE (which i currently think that i have to because i only need to group by group_id if type=0) is causing alot of delay coz it's not using indexes which it does if i dont use CASE (like just group by group_id ). Please view SQLFiddles above to see the explain results
Can anyone plz give an advice how to get it optimized
UPDATE
I tried a workaround , that does somehow works out (drops INITIAL queries to 1sec). Using union, what it does is, to minimize the resultset by union that forces SQL to write on disk for filesort (due to huge resultset), limit the resultset of group msgs, and individual msgs (view query below)
-- first part of union retrieves group msgs (that have type 0 and needs to be grouped by group_id). Applies the limit to captivate the out of control result set
-- The second query retrieves individual msgs, (those with type !=0, grouped by msgs.id - not necessary but just to be save from duplicate entries due to joins). Applies the limit to captivate the out of control result set
-- JOins the two to retrieve the desired resultset
Here's the query:
SELECT
*
FROM
(
(
SELECT
msgs.id as reference_id, user_id, type, group_id
FROM
msgs
INNER JOIN accounts
ON (msgs.user_id = accounts.id)
WHERE 1
AND accounts.id IN (50111 ) AND type = 0
GROUP BY msgs.group_id
ORDER BY msgs.id DESC
LIMIT 40
)
UNION
ALL
(
SELECT
msgs.id as reference_id, user_id, type, group_id
FROM
msgs
INNER JOIN accounts
ON (
msgs.user_id = accounts.id
)
WHERE 1
AND msgs.type != 0
AND accounts.id IN (50111)
GROUP BY msgs.id
ORDER BY msgs.id
LIMIT 40
)
) AS temp
ORDER BY reference_id
LIMIT 20,20
But has alot of caveats,
-I need to handle the limit in inner queries as well. Lets say 20recs per page, and im on page 4. For inner queries , i need to apply limit 0,80, since im not sure which of the two parts had how many records in the previous 3 pages. So, as the records per page and number of pages grow, my query grows heavier. Lets say 1k rec per page, and im on page 100 , or 1K, the load gets heavier and time exponentially increases
I need to handle ordering in inner queries and then apply on the resultset prepared by union , conditions need to be applied on both inner queries seperately(but not much of an issue)
-Cant use calc_found_rows, so will need to get count using queries seperately
The main issue is the first one. The higher i go with the pagination , the heavier it gets
Would this run faster?
SELECT id, user_id, type, group_id
FROM
( SELECT id, user_id, type, group_id, IFNULL(group_id, id) AS foo
FROM msgs
WHERE user_id IN (50111)
AND type IN (0, 1, 5, 7)
)
GROUP BY foo
ORDER BY `group_id` DESC
LIMIT 100
It needs INDEX(user_id, type).
Does this give the 'correct' answer?
SELECT DISTINCT *
FROM msgs
WHERE user_id IN (50111)
AND type IN (0, 1, 5, 7)
GROUP BY IFNULL(group_id, id)
ORDER BY `group_id` DESC
LIMIT 100
(It needs the same index)