MySQL query advice request - mysql

Can I get some help with this query. I will explain in details. To make it easier I'll take an example with HTML tags and attributes.
I have 3 table:
tbl1 - contains all the HTML tags (tagId, tagName)
tbl2 - contains all the available attributes that might appear in the
tags (attId, attName)
tbl3 - is a map table for tbl1 and tbl2 (tagId, attId)
I want to select all the attributes (and related information about the attributes) that belong to the chosen tag (in the example below tag id 4) and the ID of the tag.
Here is an example of what I'd like to get from the query:
attId tagId attName
50 4 The name of the attribute with id 50
89 4 The name of the attribute with id 89
114 4 The name of the attribute with id 114
Below is the query that I've made, but I believe there is a better way.
SELECT tbl2.*, tbl3.tagId
FROM tbl2 JOIN tbl3
WHERE tbl3.attId IN (
SELECT tbl3.attId FROM tbl3 where tbl3.tagId=4
)
AND tbl3.attIdd = tbl2.attId
GROUP BY tbl3.attId
Thanks in advance.

Issue 1
You're doing cross join that you later reduce to an inner join by putting a filter in the where clause.
This is the old-skool way when using implicit join syntax.
On most SQL-dialects (but not MySQL) this gives an error.
Never do a join without a ON clause.
If you want to do a cross join, use this syntax: select * from a cross join b
Issue 2
The sub select is really not needed, you can just do the test inside a where clause.
SELECT tbl2.*, tbl3.tagId
FROM tbl2
INNER JOIN tbl3 ON (tbl3.attIdd = tbl2.attId)
WHERE tbl3.tagId = 4
GROUP BY tbl3.attId
Issue 3a
You are doing a group by on tbl3.ATTid, which in this context is the same as doing a group by on tbl2.attid (because of the ON (tbl3.attIdd = tbl2.attId) clause)
The latter makes a bit more sense, because you are doing select tbl2.*, not select tbl3.*
Issue 3b
If attId is not a primary or unique key for tbl2, than the result of the query will be indeterminate.
That means that the query will select a row at random from a list of possible rows with the same attID. If you don't want that you'll have to include a having clause that will choose a row based on some criterion.
Example
/*selects the most recent row per attID, instead of a random one*/
SELECT tbl2.*, tbl3.tagId
FROM tbl2
INNER JOIN tbl3 ON (tbl3.attIdd = tbl2.attId)
WHERE tbl3.tagId = 4
GROUP BY tbl2.attId
HAVING tbl2.dateadded = MAX(tabl2.dateadded)

You can do it like this instead.
SELECT tbl2.*, tbl3.tagId
FROM tbl2 JOIN tbl3
ON tbl3.attId = tbl2.attId
WHERE tbl2.tagId = <selected id>
GROUP BY tbl3.attId

SELECT tbl2.attId, tbl1.tagId, tbl2.attName FROM tbl3, tbl1, tbl2
WHERE tbl1.tagId = tbl3.tagId AND tbl2.attId = tbl3.attId AND tbl3.tagId = 4

Related

How to limit list by using HAVING in MySQL?

I am trying to limit a result set to 5 from each merchant_id by using HAVING in MySQL 5.7. Unfortunatelly this does not seem to work and I can not figure out why.
My SQL query joins three tables together and identifies categories where the manufactuer has a listing in. I want to limit this list to 5 per merchant_id:
SELECT
mcs.CAT_ID
FROM tbl1 mc
INNER JOIN tbl2 mcs ON mc.ID = mcs.CAT_ID
INNER JOIN tbl3 p ON mcs.ARTICLE_ID = p.SKU
WHERE
p.MANUFACTURER_ID =18670
group by
mc.merchant_ID, mcs.CAT_ID
HAVING
COUNT(mc.merchant_id) < 5
I was reading on SO that having gets executed without looking at the where statement, but what would be the right way to limit this list?
You didn't provide tables schema and dummy data, so I can't be sure about the exact query, but I'd use the following approach:
SELECT
mc.merchant_id, t.CAT_ID
FROM tbl1 mc
INNER JOIN (
SELECT mcs.CAT_ID
FROM tbl2 AS mcs
WHERE mc.ID = mcs.CAT_ID
AND EXISTS (
SELECT 'x'
FROM tbl3 AS p
WHERE p.SKU = mcs.ARTICLE_ID
AND p.MANUFACTURER_ID = 18670
)
LIMIT 5
) as t
;
With the subquery in the join I select all the CAT_IDs relate to that mc.ID which have the listing for the product selected (18670), limited to 5 rows. In this way the limit to 5 is applied to each merchant_id

SQL Inner Joins Giving Incorrect Count Values in MS ACCESS

I am trying to use join to connect multiple tables in MS Access to get count values. But I don;t know it gives wrong count values. If I try to join them individually, then it gives me correct count values.
I have 3 Tables. Table 2 and Table 3 are independent and are connected to Table 1. Test 2 and test 3 are basically text values and I want to count the rows .
Table1(ID1(Primary Key),Name)
Table2(ID2(Primary Key), ID1(Foreign Key), Test2)
Table3 (ID3(Orimary Key), ID1(Foreign Key), Test3)
The Query that I get from MS Access is given below:
SELECT Table1. ID1, Count(Table2.Test2) AS CountOfTest2, Count(Table3.Test3) AS CountOfTest3
FROM (Table1 INNER JOIN Table2 ON Table1.ID1 = Table2.ID2)
INNER JOIN Table3 ON Table1. ID1 = Table3.ID3
GROUP BY Table1.ID1;
But this gives me wrong Count Values.
Can someone please help me.
Thanks.
When I use it individually, it gives me correct count value:
SELECT Table1. ID1, Count(Table2.Test2) AS CountOfTest2
FROM Table1 INNER JOIN Table2 ON Table1.ID1 = Table2.ID1
GROUP BY Table1.ID1;
SELECT Table1. ID1, Count(Table3.Test3) AS CountOfTest3
FROM Table1 INNER JOIN Table3 ON Table1.ID1 = Table3.ID1
GROUP BY Table1.ID1;
But when I try to join Table1, Table2 and Table 3 in MS Acces, it gives me incorrect count values.
SELECT Table1. ID1, Count(Table2.Test2) AS CountOfTest2, Count(Table3.Test3) AS CountOfTest3 FROM (Table1 INNER JOIN Table2 ON Table1.ID1 = Table2.ID1) INNER JOIN Table3 ON Table1. ID1 = Table3.ID1 GROUP BY Table1.ID1
As per my understanding it is taking the count value of 1st query in the parenthesis and multiplying it with the count values of the other inner join.
I have tried many things but don't know what to do. Access adds parenthesis for some reason.
If I can assume test2 and test 3 are unique to each record (perhaps it would be better to count the PK?)
SELECT Table1.ID1
, Count(distinct Table2.Test2) AS CountOfTest2
, Count(distinct Table3.Test3) AS CountOfTest3
FROM Table1
INNER JOIN Table2
ON Table1.ID1 = Table2.ID2
INNER JOIN Table3
ON Table1.ID1 = Table3.ID3
GROUP BY Table1.ID1;
Or you may have to get the counts before the joins though the use of inline views. You could use window functions if MSSQL SERVER but Access needs the inline views.
SELECT A.ID1
, B.CountOfTest2
, C.CountOfTest3
FROM Table1 A
INNER JOIN (SELECT Table2.ID2, count(table2.test) as CountOfTest2
FROM Table2
GROUP BY Table2.id) B
ON Table1.ID1 = B.ID2
INNER JOIN (SELECT Table3.id, count(table3.test3) as CountOfTest3
FROM Table3
GROUP BY Table3.id) C
ON B.ID1 = C.ID3
GROUP BY A.ID1;
Yup i had the same problem when i was trying to do this. just use the double sql function to counteract the html and you should be good. once the query has been doubled it will react like a C++ quota statement. If this fails you can always just quantify the source fields to adhear to the table restrictions. its actually a piece of cake i hope this helped.

Speeding up select where column condition exists in another table without duplicates

If I have the following two tables:
Table "a" with 2 columns: id (int) [Primary Index], column1 [Indexed]
Table "b" with 3 columns: id_table_a (int),condition1 (int),condition2 (int) [all columns as Primary Index]
I can run the following query to select rows from Table a where Table b condition1 is 1
SELECT a.id FROM a WHERE EXISTS (SELECT 1 FROM b WHERE b.id_table_a=a.id && condition1=1 LIMIT 1) ORDER BY a.column1 LIMIT 50
With a couple hundred million rows in both tables this query is very slow. If I do:
SELECT a.id FROM a INNER JOIN b ON a.id=b.id_table_a && b.condition1=1 ORDER BY a.column1 LIMIT 50
It is pretty much instant but if there are multiple matching rows in table b that match id_table_a then duplicates are returned. If I do a SELECT DISTINCT or GROUP BY a.id to remove duplicates the query becomes extremely slow.
Here is an SQLFiddle showing the example queries: http://sqlfiddle.com/#!9/35eb9e/10
Is there a way to make a join without duplicates fast in this case?
*Edited to show that INNER instead of LEFT join didn't make much of a difference
*Edited to show moving condition to join did not make much of a difference
*Edited to add LIMIT
*Edited to add ORDER BY
You can try with inner join and distinct
SELECT distinct a.id
FROM a INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
but using distinct on select * be sure you don't distinct id that return wrong result in this case use
SELECT distinct col1, col2, col3 ....
FROM a INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
You could also add a composite index with use also condtition1 eg: key(id, condition1)
if you can you could also perform a
ANALYZE TABLE table_name;
on both the table ..
and another technique is try to reverting the lead table
SELECT distinct a.id
FROM b INNER JOIN a ON a.id=b.id_table_a AND b.condition1=1
Using the most selective table for lead the query
Using this seem different the use of index http://sqlfiddle.com/#!9/35eb9e/15 (the last add a using where)
# USING DISTINCT TO REMOVE DUPLICATES without col and order
EXPLAIN
SELECT DISTINCT a.id
FROM a
INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
;
It looks like I found the answer.
SELECT a.id FROM a
INNER JOIN b ON
b.id_table_a=a.id &&
b.condition1=1 &&
b.condition2=(select b.condition2 from b WHERE b.id_table_a=a.id && b.condition1=1 LIMIT 1)
ORDER BY a.column1
LIMIT 5;
I don't know if there is a flaw in this or not, please let me know if so. If anyone has a way to compress this somehow I will gladly accept your answer.
SELECT id FROM a INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
Take the condition into the ON clause of the join, that way the index of table b can get used to filter. Also use INNER JOIN over LEFT JOIN
Then you should have less results which have to be grouped.
Wrap the fast version in a query that handles de-duping and limit:
SELECT DISTINCT * FROM (
SELECT a.id
FROM a
JOIN b ON a.id = b.id_table_a && b.condition1 = 1
) x
ORDER BY column1
LIMIT 50
We know the inner query is fast. The de-duping and ordering has to happen somewhere. This way it happens on the smallest rowset possible.
See SQLFiddle.
Option 2:
Try the following:
Create indexes as follows:
create index a_id_column1 on a(id, column1)
create index b_id_table_a_condition1 on b(a_table_a, condition1)
These are covering indexes - ones that contain all the columns you need for the query, which in turn means that index-only access to data can achieve the result.
Then try this:
SELECT * FROM (
SELECT a.id, MIN(a.column1) column1
FROM a
JOIN b ON a.id = b.id_table_a
AND b.condition1 = 1
GROUP BY a.id) x
ORDER BY column1
LIMIT 50
Use your fast query in a subselect and remove the duplicates in the outer select:
SELECT DISTINCT sub.id
FROM (
SELECT a.id
FROM a
INNER JOIN b ON a.id=b.id_table_a && b.condition1=1
WHERE b.id_table_a > :offset
ORDER BY a.column1
LIMIT 50
) sub
Because of removing duplicates you might get less than 50 rows. Just repeat the query until you get anough rows. Start with :offset = 0. Use the last ID from last result as :offset in the following queries.
If you know your statistics, you can also use two limits. The limit in the inner query should be high enough to return 50 distinct rows with a probability which is high enough for you.
SELECT DISTINCT sub.id
FROM (
SELECT a.id
FROM a
INNER JOIN b ON a.id=b.id_table_a && b.condition1=1
ORDER BY a.column1
LIMIT 1000
) sub
LIMIT 50
For example: If you have an average of 10 duplicates per ID, LIMIT 1000 in the inner query will return an average of 100 distinct rows. Its very unlikely that you get less than 50 rows.
If the condition2 column is a boolean, you know that you can have a maximum of two duplicates. In this case LIMIT 100 in the inner query would be enough.

Inner join on two different fields of 2 different tables

I have one table which has fields X,Y,Z,BAGID.
The BAGID is in the form of (12345-400) where 12345 is the user's id and 400 is the BAG's id.
I have another table which has fields A,B,C,USERID.
The USERID is in the form of 12345 which is same as the first part of BAGID.
So is it possible to join these two tables on the common USERID and get the fields USERID,X,Y,A,B?
Table 1:
X Y Z BAGID(userid+bagid)
1 2 4 12345-400
Table 2 :
A B C USERID
3 5 7 12345
I want the output as:
X Y A B USERID
1 2 3 5 12345
Is it possible to have a join these two tables?
select Table1.X, Table1.Y, Table2.A, Table2.B, Table2.USERID
from Table1
inner join Table2
on Table1.BAGID = Table2.USERID;
I know i cannot user BAGID and USERID as they are different. But is it possible for me to use the userid part of the BAGID of Table1 which is the same as USERID of Table2?
Any help would be appreciated.
You can use the SUBSTRING_INDEX to extract USERID out of BAGID:
select Table1.X, Table1.Y, Table2.A, Table2.B, Table2.USERID
from Table1
inner join Table2 on SUBSTRING_INDEX(Table1.BAGID, '-', 1) = Table2.USERID
This will work provided that there is only one '-' in BAGID.
Demo here
Sure, just join on LEFT(BAGID,5). Depending on the USERID DataType you may need to CAST it as well.
If the USERID portion of BAGIT is variable length you first need to find the length using INSTR(BAGID, '-')
If you're using t sql you can use the SUBSTRING ( expression ,start , length ) function to get only the first 5 characters of the bag id, and then join on that value. Ie
SELECT *
FROM table1
INNER JOIN table2 ON SUBSTRING(TABLE1.bagid, 0, 5) = table2.userid
If not using t sql, whatever you're using should have a similar substring function
You can inner join on substring of table1 column
Select Table1.X, Table1.Y, Table2.A, Table2.B, Table2.USERID
From Table1
Inner join Table2
ON SUBSTRING_INDEX(Table1.BAGID,'-',1) = Table2.USERID;

DISTINCT in mysql query removing the records from resultset

DISTINCT in mysql query removing the records from resultset
I have three tables
TBL1 TBL2 TBL3
---- ------ --------
tbl1_id tbl2_id tbl3_id
cid fkcid fkcid
fktbl1_id fktbl2_id
I have query to get records of TBL3
select distinct tbl3.* from TBL3 tbl3
inner join TBL2 tbl2 on tbl2.tbl2_id = tbl3.fktbl2_id and tbl2.fkcid = tbl3.fkcid
inner join TBL1 tbl1 on tbl1.tbl1_id = tbl2.fktbl1_id and tbl2.fkcid = tbl1.cid;
This query gives me around 1000 records.
But when I removes distinct from query it gives me around 1100 records.
There is no duplicate records in table.Also I confirmed that these extra 100 are not duplicate.Please note That these extra 100 records are not found in query with distinct keyword.
Why this query is behaving unexpectedly.Please help me to understand more clearly and correct me if i am making mistake.
Thank you
You have multiple records in tbl1 or tbl2 that map to the same tbl3, and since you're only selecting tbl3.* in your output, DISTINCT removes the duplication. To instead find what the duplicates are, remove the DISTINCT, add a COUNT(*) to the SELECT clause, and add at the end a GROUP BY and HAVING, such as:
select tbl3.*, count(*)
from TBL3 tbl3
inner join TBL2 tbl2 on tbl2.tbl2_id = tbl3.fktbl2_id and tbl2.fkcid = tbl3.fkcid
inner join TBL1 tbl1 on tbl1.tbl1_id = tbl2.fktbl1_id and tbl2.fkcid = tbl1.cid
group by tbl3.tbl3_id, tbl3.fkcid, tbl3.fktbl2_id having count(*) > 1;