How to limit list by using HAVING in MySQL? - mysql

I am trying to limit a result set to 5 from each merchant_id by using HAVING in MySQL 5.7. Unfortunatelly this does not seem to work and I can not figure out why.
My SQL query joins three tables together and identifies categories where the manufactuer has a listing in. I want to limit this list to 5 per merchant_id:
SELECT
mcs.CAT_ID
FROM tbl1 mc
INNER JOIN tbl2 mcs ON mc.ID = mcs.CAT_ID
INNER JOIN tbl3 p ON mcs.ARTICLE_ID = p.SKU
WHERE
p.MANUFACTURER_ID =18670
group by
mc.merchant_ID, mcs.CAT_ID
HAVING
COUNT(mc.merchant_id) < 5
I was reading on SO that having gets executed without looking at the where statement, but what would be the right way to limit this list?

You didn't provide tables schema and dummy data, so I can't be sure about the exact query, but I'd use the following approach:
SELECT
mc.merchant_id, t.CAT_ID
FROM tbl1 mc
INNER JOIN (
SELECT mcs.CAT_ID
FROM tbl2 AS mcs
WHERE mc.ID = mcs.CAT_ID
AND EXISTS (
SELECT 'x'
FROM tbl3 AS p
WHERE p.SKU = mcs.ARTICLE_ID
AND p.MANUFACTURER_ID = 18670
)
LIMIT 5
) as t
;
With the subquery in the join I select all the CAT_IDs relate to that mc.ID which have the listing for the product selected (18670), limited to 5 rows. In this way the limit to 5 is applied to each merchant_id

Related

Speeding up select where column condition exists in another table without duplicates

If I have the following two tables:
Table "a" with 2 columns: id (int) [Primary Index], column1 [Indexed]
Table "b" with 3 columns: id_table_a (int),condition1 (int),condition2 (int) [all columns as Primary Index]
I can run the following query to select rows from Table a where Table b condition1 is 1
SELECT a.id FROM a WHERE EXISTS (SELECT 1 FROM b WHERE b.id_table_a=a.id && condition1=1 LIMIT 1) ORDER BY a.column1 LIMIT 50
With a couple hundred million rows in both tables this query is very slow. If I do:
SELECT a.id FROM a INNER JOIN b ON a.id=b.id_table_a && b.condition1=1 ORDER BY a.column1 LIMIT 50
It is pretty much instant but if there are multiple matching rows in table b that match id_table_a then duplicates are returned. If I do a SELECT DISTINCT or GROUP BY a.id to remove duplicates the query becomes extremely slow.
Here is an SQLFiddle showing the example queries: http://sqlfiddle.com/#!9/35eb9e/10
Is there a way to make a join without duplicates fast in this case?
*Edited to show that INNER instead of LEFT join didn't make much of a difference
*Edited to show moving condition to join did not make much of a difference
*Edited to add LIMIT
*Edited to add ORDER BY
You can try with inner join and distinct
SELECT distinct a.id
FROM a INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
but using distinct on select * be sure you don't distinct id that return wrong result in this case use
SELECT distinct col1, col2, col3 ....
FROM a INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
You could also add a composite index with use also condtition1 eg: key(id, condition1)
if you can you could also perform a
ANALYZE TABLE table_name;
on both the table ..
and another technique is try to reverting the lead table
SELECT distinct a.id
FROM b INNER JOIN a ON a.id=b.id_table_a AND b.condition1=1
Using the most selective table for lead the query
Using this seem different the use of index http://sqlfiddle.com/#!9/35eb9e/15 (the last add a using where)
# USING DISTINCT TO REMOVE DUPLICATES without col and order
EXPLAIN
SELECT DISTINCT a.id
FROM a
INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
;
It looks like I found the answer.
SELECT a.id FROM a
INNER JOIN b ON
b.id_table_a=a.id &&
b.condition1=1 &&
b.condition2=(select b.condition2 from b WHERE b.id_table_a=a.id && b.condition1=1 LIMIT 1)
ORDER BY a.column1
LIMIT 5;
I don't know if there is a flaw in this or not, please let me know if so. If anyone has a way to compress this somehow I will gladly accept your answer.
SELECT id FROM a INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
Take the condition into the ON clause of the join, that way the index of table b can get used to filter. Also use INNER JOIN over LEFT JOIN
Then you should have less results which have to be grouped.
Wrap the fast version in a query that handles de-duping and limit:
SELECT DISTINCT * FROM (
SELECT a.id
FROM a
JOIN b ON a.id = b.id_table_a && b.condition1 = 1
) x
ORDER BY column1
LIMIT 50
We know the inner query is fast. The de-duping and ordering has to happen somewhere. This way it happens on the smallest rowset possible.
See SQLFiddle.
Option 2:
Try the following:
Create indexes as follows:
create index a_id_column1 on a(id, column1)
create index b_id_table_a_condition1 on b(a_table_a, condition1)
These are covering indexes - ones that contain all the columns you need for the query, which in turn means that index-only access to data can achieve the result.
Then try this:
SELECT * FROM (
SELECT a.id, MIN(a.column1) column1
FROM a
JOIN b ON a.id = b.id_table_a
AND b.condition1 = 1
GROUP BY a.id) x
ORDER BY column1
LIMIT 50
Use your fast query in a subselect and remove the duplicates in the outer select:
SELECT DISTINCT sub.id
FROM (
SELECT a.id
FROM a
INNER JOIN b ON a.id=b.id_table_a && b.condition1=1
WHERE b.id_table_a > :offset
ORDER BY a.column1
LIMIT 50
) sub
Because of removing duplicates you might get less than 50 rows. Just repeat the query until you get anough rows. Start with :offset = 0. Use the last ID from last result as :offset in the following queries.
If you know your statistics, you can also use two limits. The limit in the inner query should be high enough to return 50 distinct rows with a probability which is high enough for you.
SELECT DISTINCT sub.id
FROM (
SELECT a.id
FROM a
INNER JOIN b ON a.id=b.id_table_a && b.condition1=1
ORDER BY a.column1
LIMIT 1000
) sub
LIMIT 50
For example: If you have an average of 10 duplicates per ID, LIMIT 1000 in the inner query will return an average of 100 distinct rows. Its very unlikely that you get less than 50 rows.
If the condition2 column is a boolean, you know that you can have a maximum of two duplicates. In this case LIMIT 100 in the inner query would be enough.

Count distinct values with SELECT query results with error

I have a weird situation. I need to select all data from table name with distinct values from other table.
Here is database scheme of database that I need to get distinct values:
When I run both queries without INNER JOIN they run without error but when I use INNER JOIN I got error
This is query that I used:
SELECT * FROM `todo`
INNER JOIN
SELECT `task`.`status`,COUNT(*) as count FROM `task`
ON `todo`.`id`=`task`.`id_list` WHERE `todo`.`user_id` = 43
As you can see I need to get total count of status column from other table. Can it be done using one single query or do I need to run two querys to get data...
You need to wrap the join In parenthesis
SELECT td.*, t.*
FROM `todo` td
JOIN
( SELECT `status`, SUM(status=0) as status_0, SUM(status=1) as status_1 , id_list
FROM `task`
GROUP BY id_list
) t ON td.id= t.id_list
WHERE td.user_id = 43
You can do this in one query. Even without a subquery:
SELECT ta.status, COUNT(*) as count
FROM todo t INNER JOIN
task ta
ON t.id = ta.id_list
WHERE t.user_id = 43
GROUP BY ta.status;
EDIT:
If the above produces what you want, then you probably need:
SELECT t.*, ta.status, taa.cnt
FROM todo t INNER JOIN
task ta
ON t.id = ta.id_list INNER JOIN
(SELECT count(*) as cnt, ta.status
FROM task ta
GROUP BY ta.status
) taa
on ta.status = taa.status
WHERE t.user_id = 43 ;
You seem to want a summary at the status level, which is only in task. But you want the information at the row level for todo.

SQL - selecting userid which has max number of not null rows in a different table

I've 3 tables say A,B,C.
Table A has userid column.
Table B has caid column.
Table C has lisid and image columns.
one userid can have one or several caids.
one caid can have one or several lisids.
how do I select a userid which has maximum number of rows with image column as not null (in some lisids image column is blank and in some it has some value).
can someone please help.
Presumably, the ids are spread among the tables in a reasonable fashion. If so, the following should do this:
select b.userid, count(*)
from TableB b join
TableC c
on b.caid = c.caid
where c.image is not null
group by b.userid
order by count(*) desc
limit 1
The question in the comments is how you connect TableA to TableB and TableB to TableC. The reasonable approach is to have the userid in TableB and the caid in TableC.
Getting all the rows with the max requires a bit more work. Essentially, you have to join in the above query to get the list
select s.*
from (select b.userid, count(*) as cnt
from TableB b join
TableC c
on b.caid = c.caid
) s
(select count(*) as maxcnt
from TableB b join
TableC c
on b.caid = c.caid
group by b.userid
order by count(*) desc
limit 1
) smax
on s.cnt = smax.cnt
Other databses have a set of functions called window functions/ranking functions that make this sort of query much simpler. Alas, MySQL does not offer these.

MySQL query advice request

Can I get some help with this query. I will explain in details. To make it easier I'll take an example with HTML tags and attributes.
I have 3 table:
tbl1 - contains all the HTML tags (tagId, tagName)
tbl2 - contains all the available attributes that might appear in the
tags (attId, attName)
tbl3 - is a map table for tbl1 and tbl2 (tagId, attId)
I want to select all the attributes (and related information about the attributes) that belong to the chosen tag (in the example below tag id 4) and the ID of the tag.
Here is an example of what I'd like to get from the query:
attId tagId attName
50 4 The name of the attribute with id 50
89 4 The name of the attribute with id 89
114 4 The name of the attribute with id 114
Below is the query that I've made, but I believe there is a better way.
SELECT tbl2.*, tbl3.tagId
FROM tbl2 JOIN tbl3
WHERE tbl3.attId IN (
SELECT tbl3.attId FROM tbl3 where tbl3.tagId=4
)
AND tbl3.attIdd = tbl2.attId
GROUP BY tbl3.attId
Thanks in advance.
Issue 1
You're doing cross join that you later reduce to an inner join by putting a filter in the where clause.
This is the old-skool way when using implicit join syntax.
On most SQL-dialects (but not MySQL) this gives an error.
Never do a join without a ON clause.
If you want to do a cross join, use this syntax: select * from a cross join b
Issue 2
The sub select is really not needed, you can just do the test inside a where clause.
SELECT tbl2.*, tbl3.tagId
FROM tbl2
INNER JOIN tbl3 ON (tbl3.attIdd = tbl2.attId)
WHERE tbl3.tagId = 4
GROUP BY tbl3.attId
Issue 3a
You are doing a group by on tbl3.ATTid, which in this context is the same as doing a group by on tbl2.attid (because of the ON (tbl3.attIdd = tbl2.attId) clause)
The latter makes a bit more sense, because you are doing select tbl2.*, not select tbl3.*
Issue 3b
If attId is not a primary or unique key for tbl2, than the result of the query will be indeterminate.
That means that the query will select a row at random from a list of possible rows with the same attID. If you don't want that you'll have to include a having clause that will choose a row based on some criterion.
Example
/*selects the most recent row per attID, instead of a random one*/
SELECT tbl2.*, tbl3.tagId
FROM tbl2
INNER JOIN tbl3 ON (tbl3.attIdd = tbl2.attId)
WHERE tbl3.tagId = 4
GROUP BY tbl2.attId
HAVING tbl2.dateadded = MAX(tabl2.dateadded)
You can do it like this instead.
SELECT tbl2.*, tbl3.tagId
FROM tbl2 JOIN tbl3
ON tbl3.attId = tbl2.attId
WHERE tbl2.tagId = <selected id>
GROUP BY tbl3.attId
SELECT tbl2.attId, tbl1.tagId, tbl2.attName FROM tbl3, tbl1, tbl2
WHERE tbl1.tagId = tbl3.tagId AND tbl2.attId = tbl3.attId AND tbl3.tagId = 4

Filter out rows by hardcoded list in MySQL performance

I have a hardcoded list of values like: 1,5,7,8 and so on.
And I must filter out rows from table that have ID in list above, so I do something like this:
Select
*
from myTable m
left join othertable t
on t.REF_ID = m.ID
where m.ID not in (1,5,7,8...)
But when I have more values (like 1000) and more rows (100) in othertable and myTable this query starts to be slow. I have an index on REF_ID and ID. It seems that the part "where m.ID in (1,5,7,8) is the problem.
Is there faster way to filter out rows by hardcoded list of values?
Try putting your list in a temporary table as temptable.ID and doing
SELECT *
FROM myTable m
LEFT JOIN othertable t ON t.REF_ID = m.ID
LEFT JOIN temptable ON m.ID = temptable.ID
WHERE temptable.ID IS NULL