MySQL : Top K with duplicates - mysql

Ex: Lets say I have 10 categories (a,b,c,d,e,f,g,h,i,j) and the counts of products in each category are : (5,6,10,4,10,4,6,10,10,4). Now if I want to find the top 5 categories with max products:
c - 10
e - 10
h - 10
i - 10
(b,g) - 6 (sometimes it will be b and sometimes it will be g, if I use the LIMIT 5 option.)
What I need: If there are categories with counts same and there are no fixed rule to return which category, then I want the sql query to return all such categories. In the above example, I want the sql query to return 6 rows. In case if all categories have 10 products, then querying for top 5, I need 10 rows to be returned.
I saw this question : Selecting the top 5 in a column with duplicates. But it has a different requirement.

You can achieve this with an inner select. First get the counts of the top k categories, then get all categories that have those counts.
Select cat_count, category from
(select count(category) as top_count
from products group by category order by count(category) desc limit 5)
as t1 inner join
(select count(category) as cat_count, category
from products group by category) as t2 on t1.top_count = t2.cat_count
Or written differently :
select count(category), category
from products
group by category
having count(category) in
(select count(category) as top_count
from products
group by category
order by count(category) desc limit 5)

Related

How to select MySql one random row from each group [duplicate]

I have a database with an Items table that looks something like this:
id
name
category (int)
There are several hundred thousand records. Each item can be in one of 7 different categories, which correspond to a categories table:
id
category
I want a query that chooses 1 random item, from each category. Whats the best way of approaching that? I know to use Order By rand() and LIMIT 1for similar random queries, but I've never done something like this.
This query returns all items joined to categories in random order:
SELECT
c.id AS cid, c.category, i.id AS iid, i.name
FROM categories c
INNER JOIN items i ON c.id = i.category
ORDER BY RAND()
To restrict each category to one, wrap the query in a partial GROUP BY:
SELECT * FROM (
SELECT
c.id AS cid, c.category, i.id AS iid, i.name
FROM categories c
INNER JOIN items i ON c.id = i.category
ORDER BY RAND()
) AS shuffled_items
GROUP BY cid
Note that when a query has both GROUP BY and ORDER BY clause, the grouping is performed before sorting. This is why I have used two queries: the first one sorts the results, the second one groups the results.
I understand that this query isn't going to win any race. I am open to suggestions.
Here is a simple solution. Let suppose you have this table.
id name category
1 A 1
2 B 1
3 C 1
4 D 2
5 E 2
6 F 2
7 G 3
8 H 3
9 I 3
Use this query
select
c.id,
c.category,
(select name from category where category = c.category group by id order by rand() limit 1) as CatName
from category as c
group by category
Try this
SELECT id, name, category from Items where
(
select count(*) from Items i where i.category = Items.category
GROUP BY i.category ORDER BY rand()
) <= 1
REF: http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/
Change order of the original table (random order), before final select:
select * from
(select category, id, name from categories order by rand()) as tab
group by 1
Please note: in the following example I am assuming your table is named "items" not "Items" because you also said the other table was named "categories" (second table name not capitalized).
The SQL for what you want to do would roughly be:
`SELECT items.id AS item_id,
items.name AS item_name,
items.category AS item_category_id,
categories.id AS category_id,
categories.category AS category_name
FROM items, category
WHERE items.category = categories.id
ORDER BY rand()
LIMIT 1`

Display the category list even if it does not contain any item

In Connection to my question,
How to display the list of categories which contain items in mysql
I would like to ask how to display the list of category from the items even if the category does not contain any record in itemtbl. Here is my query:
SELECT *, count(*) as cnt
FROM categorytbl LEFT JOIN itemstbl
ON itemstbl.cat_id=categorytbl.cat_id
GROUP BY itemstbl.cat_id
ORDER BY cnt DESC
the result is :
Pet(1)
person(2)
I want the result to be:
Pet(1)
person(2)
Places(0)
After a couple if minutes of fixing this problem, I got this answer:
SELECT categorytbl.cat_id AS cat_id , count(itemstbl.cat_id) as cnt
FROM categorytbl LEFT JOIN itemstbl
ON itemstbl.cat_id=categorytbl.cat_id
GROUP BY cat_id
ORDER BY cnt DESC
The result now is:
Pet(1) person(2) Places(0)
The idea is, just Group the cat_id on categorytbl and count the cat_id on itemstbl.

Select items from categories and item should also be in another category

i have one issue which i'm unable to solve.
The problem is with sql query i need to select items from categories which should also be in another category.
basically it is something like this :
select * from items where category_id in (1,2,3,4,5) and category_id = 6
of course above statement will not work, i have tried to:
select * from items where category_id in (1,2,3,4,5,6)
GROUP BY item_id HAVING COUNT(DISTINCT category_id) > 1
but this will give me also items which are exist in multiple categories like items which exist in category 1, category 2 and in category 3, but not in category 6.
any suggestion on how to solve this issue.
One way is to filter the resulting groups, by counting the number of constituent records that match each of your criteria and combining with suitable logical operations:
SELECT item_id
FROM items
WHERE category_id IN (1,2,3,4,5,6) -- only for performance, if indexed
GROUP BY item_id
HAVING SUM(category_id IN (1,2,3,4,5))
AND SUM(category_id = 6)
Another way would be to use a self-join, with each side of the join filtering the groups according to a different criterion:
SELECT item_id
FROM items a JOIN items b USING (item_id)
WHERE a.category_id IN (1,2,3,4,5)
AND b.category_id = 6
GROUP BY item_id

SQL top records based on two tables relations

I have three main items I am storing: Articles, Entities, and Keywords. This makes 5 tables:
article { id }
entity {id, name}
article_entity {id, article_id, entity_id}
keyword {id, name}
article_keyword {id, article_id, keyword_id}
I would like to get all articles that contain the TOP X keywords + entities. I can get the top X keywords or entities with a simple group by on the entity_id/keyword_id.
SELECT [entity|keyword]_id, count(*) as num FROM article_entity
GROUP BY entity_id ORDER BY num DESC LIMIT 10
How would I get all articles that have a relation to the top entities and keywords?
This was what I imagined, but I know it doesn't work because of the group by entity limiting the article_id's that return.
SELECT * FROM article
WHERE EXISTS (
[... where article is mentioned in top X entities.. ]
) AND EXISTS (
[... where article is mentioned in top X keywords.. ]
);
If I understand you correct the objective of the query is to find the articles that have a relation to both one of the top 10 entities as well as to one of the top 10 keywords. If this is the case the following query should do that, by requiring that the article returned has a match in both the set of top 10 entities and the set of top 10 keywords.
Please give it a try.
SELECT a.id
FROM article a
INNER JOIN article_entity ae ON a.id = ae.article_id
INNER JOIN article_keyword ak ON a.id = ak.article_id
INNER JOIN (
SELECT entity_id, COUNT(article_id) AS article_entity_count
FROM article_entity
GROUP BY entity_id
ORDER BY article_entity_count DESC LIMIT 10
) top_ae ON ae.entity_id = top_ae.entity_id
INNER JOIN (
SELECT keyword_id, COUNT(article_id) AS article_keyword_count
FROM article_keyword
GROUP BY keyword_id
ORDER BY article_keyword_count DESC LIMIT 10
) top_ak ON ak.keyword_id = top_ak.keyword_id
GROUP BY a.id;
The downside to using a simplelimit 10in the two subqueries for top entities/keywords is that it won't handle ties, so if the 11th keyword was just as popular as the 10th it still won't get chosen. This can be fixed though by using a ranking function, but afaik MySQL doesn't have anything build in (like RANK() window functions in Oracle or MSSQL).
I set up a sample SQL Fiddle (but using fewer data points andlimit 2as I'm lazy).
Not knowing the volume of data you are working with, I would first recommend that you have two storage columns on your article table for count of entities and keywords respectively. Then via triggers on adding/deleting from each, update the respective counter columns. This way, you don't have to do a burning query each time needed, especially in a web-based interface. Then, you can just select from the articles table ordered by the E+K counts descending and be done with it, instead of constant sub-querying the underlying tables.
Now, that said, the other suggestions are somewhat similar to what I am posting, but they all appear to be doing a limit of 10 records for each set. Lets throw this scenario into the picture. Say you have articles 1-20 all a range of 10, 9 and 8 entities and 1-2 keywords. Then articles 21-50 have the reverse... 10, 9, 8 keywords and 1-2 entities. Now, you have articles 51-58 that have 7 entities AND 7 keywords total of 14 combined points. None of the queries would have caught this as entities would only return the qualifying 1-20 records and keywords records 21-50. Articles 51-58 would be so far down on the list, it would not even be considered even though its total is 14.
To handle this, each sub-query is a full query specifically on the article ID and its count. Simple order by the article_ID as that is basis of the join to the master article table.
Now, the coalesce() will get the count if so available, otherwise 0 and add the two values together. From that, the results are ordered with the highest counts first (thus getting scenario sample articles 51-58 plus a few of the others) when the limit is applied.
SELECT
a.id,
coalesce( JustE.ECount, 0 ) ECount,
coalesce( JustK.KCount, 0 ) KCount,
coalesce( JustE.ECount, 0 ) + coalesce( JustK.KCount, 0 ) TotalCnt
from
article a
LEFT JOIN ( select article_id, COUNT(*) as ECount
from article_entity
group by article_id
order by article_id ) JustE
on a.id = JustE.article_id
LEFT JOIN ( select article_id, COUNT(*) as KCount
from article_keyword
group by article_id
order by article_id ) JustK
on a.id = JustK.article_id
order by
coalesce( JustE.ECount, 0 ) + coalesce( JustK.KCount, 0 ) DESC
limit 10
I took this in several steps
tl;dr This shows all the articles from the top (4) keywords and entities:
Here's a fiddle
select
distinct article_id
from
(
select
article_id
from
article_entity ae
inner join
(select
entity_id, count(*)
from
article_entity
group by
entity_id
order by
count(*) desc
limit 4) top_entities on ae.entity_id = top_entities.entity_id
union all
select
article_id
from
article_keyword ak
inner join
(select
keyword_id, count(*)
from
article_keyword
group by
keyword_id
order by
count(*) desc
limit 4) top_keywords on ak.keyword_id = top_keywords.keyword_id) as articles
Explanation:
This starts with an effort to find the top X entities. (4 seemed to work for the number of associations i wanted to make in the fiddle)
I didn't want to select articles here because it skews the group by, you want to focus solely on the top entities. Fiddle
select
entity_id, count(*)
from
article_entity
group by
entity_id
order by
count(*) desc
limit 4
Then I selected all the articles from these top entities. Fiddle
select
*
from
article_entity ae
inner join
(select
entity_id, count(*)
from
article_entity
group by
entity_id
order by
count(*) desc
limit 4) top_entities on ae.entity_id = top_entities.entity_id
Obviously the same logic needs to happen for the keywords. The queries are then unioned together (fiddle) and the distinct article ids are pulled from the union.
This will give you all articles that have a relation to the top (x) entities and keywords.
This gets the top 10 keyword articles that are also a top 10 entity. You may not get 10 records back because it is possible that an article only meets one of the criteria (top entity but not top keyword or top keyword but not top entity)
select *
from article a
inner join
(select count(*),ae.article_id
from article_entity ae
group by ae.article_id
order by count(*) Desc limit 10) e
on a.id = e.article_id
inner join
(select count(*),ak.article_id
from article_keyword ak
group by ak.article_id
order by count(*) Desc limit 10) k
on a.id = k.article_id

MYSQL select results from 3 tables with an array of ids

Ok, so I have 3 mysql tables where I need to extract data from. Anything to do with joins really gets me stuck!
Table 1 = products (productid, name)
Table 2 = category (categoryid, name)
Table 3 = categoryproduct (categoryid, productid) - my join table
I have an array of product ids which I need to get a random selection of products that fall into the same categories as these products.
The idea is that the results of the query will display a section in my cart of similar/related products that the customer may like
So something like
SELECT name etc FROM table1
WHERE table2.categoryid of results of the query = table3.categoryid of current products
ORDER BY RAND()
LIMIT 3
How do I write that??
Assuming you're using PHP, following method will fetch 10 related products from database.
$productids = array(1002,789,999,203,321);
$sql = '
SELECT * FROM
products p JOIN categoryproduct pc
ON p.productid = pc.productid
WHERE pc.categoryid IN(
SELECT DISTINCT(categoryid) FROM
products inner_p JOIN categoryproduct inner_pc
ON inner_p.productid = inner_pc.productid
WHERE inner_p.productid IN('.implode(',',$productids).')
)
ORDER BY RAND()
LIMIT 10';
If i have understood your problem correctly then this query may help. Here instead of subquery you can give comma separated string which contains categoryid of different products selected by the user.
select p.name
from products p,categoryproduct cp
where p.productid=cp.productid
and cp.categorid in(
select categoryid
from cartitems)
order by RAND()