Couchbase/N1QL: SELECT FROM list of values provided by parameter - couchbase

As a follow-up to Get top rows by "category" from a collection I still want to get the top5 products per categoryId, but I want to provide a pre-selected list of categoryIds that are relevant to me.
Starting with vsr's answer from the original question, I could do something like:
SELECT u.*
FROM (SELECT DISTINCT RAW p.categoryId
FROM products AS p
WHERE p.categoryId IN $categoryIds) AS c
UNNEST (SELECT p1.*
FROM products AS p1
WHERE p1.categoryId = c
ORDER BY p1.categoryId, p1.price DESC
LIMIT 5) AS u;
where the named parameter $categoryIds will be provided as an array ['cat1', 'cat2'].
It feels a bit inefficient to do the SELECT DISTINCT RAW p.categoryId FROM products AS p WHERE p.categoryId IN $categoryIds, just to get something back that is essentially again my list of provided categoryIds.
I am sure there is more efficient way to express this. Something like:
SELECT u.*
FROM (VALUES IN $categoryIds) AS c
UNNEST ...;

CREATE INDEX ix1 ON products(categoryId, price DESC);
So that below subquery in the Unnest uses index order and retrieves top 5 entries per category only irrespective of number of entries in specific category
If $categoryIds contain unique entries
SELECT u.*
FROM $categoryIds AS c
UNNEST (SELECT p1.*
FROM products AS p1
WHERE p1.categoryId = c
ORDER BY p1.categoryId, p1.price DESC
LIMIT 5) AS u;
For non-unique entries
SELECT u.*
FROM (SELECT DISTINCT RAW c1
FROM $categoryIds AS c1 ) AS c
UNNEST (SELECT p1.*
FROM products AS p1
WHERE p1.categoryId = c
ORDER BY p1.categoryId, p1.price DESC
LIMIT 5) AS u;

Related

How to select MySql one random row from each group [duplicate]

I have a database with an Items table that looks something like this:
id
name
category (int)
There are several hundred thousand records. Each item can be in one of 7 different categories, which correspond to a categories table:
id
category
I want a query that chooses 1 random item, from each category. Whats the best way of approaching that? I know to use Order By rand() and LIMIT 1for similar random queries, but I've never done something like this.
This query returns all items joined to categories in random order:
SELECT
c.id AS cid, c.category, i.id AS iid, i.name
FROM categories c
INNER JOIN items i ON c.id = i.category
ORDER BY RAND()
To restrict each category to one, wrap the query in a partial GROUP BY:
SELECT * FROM (
SELECT
c.id AS cid, c.category, i.id AS iid, i.name
FROM categories c
INNER JOIN items i ON c.id = i.category
ORDER BY RAND()
) AS shuffled_items
GROUP BY cid
Note that when a query has both GROUP BY and ORDER BY clause, the grouping is performed before sorting. This is why I have used two queries: the first one sorts the results, the second one groups the results.
I understand that this query isn't going to win any race. I am open to suggestions.
Here is a simple solution. Let suppose you have this table.
id name category
1 A 1
2 B 1
3 C 1
4 D 2
5 E 2
6 F 2
7 G 3
8 H 3
9 I 3
Use this query
select
c.id,
c.category,
(select name from category where category = c.category group by id order by rand() limit 1) as CatName
from category as c
group by category
Try this
SELECT id, name, category from Items where
(
select count(*) from Items i where i.category = Items.category
GROUP BY i.category ORDER BY rand()
) <= 1
REF: http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/
Change order of the original table (random order), before final select:
select * from
(select category, id, name from categories order by rand()) as tab
group by 1
Please note: in the following example I am assuming your table is named "items" not "Items" because you also said the other table was named "categories" (second table name not capitalized).
The SQL for what you want to do would roughly be:
`SELECT items.id AS item_id,
items.name AS item_name,
items.category AS item_category_id,
categories.id AS category_id,
categories.category AS category_name
FROM items, category
WHERE items.category = categories.id
ORDER BY rand()
LIMIT 1`

Return the company with most film in a genre

I am working on this project at my university, where I need to create a query to the database. I want the query to return the company with most movies in the given genre. At the moment I have this query, but this only return one company, but there can probably be more than one.
SELECT CompanyID, CategoryID, COUNT(*) as NumberOfMovies
FROM Movie
NATURAL JOIN CategoryFilm
NATURAL JOIN Category
NATUAL JOIN Comapny
GROUP BY CategoryID, CompanyID
Order by NumberOfMovies DESC LIMIT 1
I beleave I will need a "having" in here.
pls try this, it may because you added limit 1, which only show 1st retrieved record
SELECT CompanyID, CategoryID, COUNT(*) as NumberOfMovies
FROM Movie
NATURAL JOIN CategoryFilm
NATURAL JOIN Category
NATURAL JOIN Comapny
GROUP BY CategoryID, CompanyID
Order by NumberOfMovies DESC
I assume by "category" you mean "genre" -- or that they are the same thing.
Do not use NATURAL JOIN. It does not even use properly declared foreign key relationships, instead relying merely on name similarity between tables. It is dangerous because the columns used are not specified and can introduce hard-to-debug errors. I often refer to it as an "abomination" because it does not take table declarations into account.
If you have a given category, then I would expect a WHERE clause:
SELECT CompanyID, COUNT(*) as NumberOfMovies
FROM Movie m JOIN
CategoryFilm cf
ON cf.movie_id = m.movie_id JOIN
Company c
ON c.company_id = m.company_id
WHERE cf.category_id = ?
GROUP BY CategoryID
ORDER BY NumberOfMovies DESC
LIMIT 1;
If you want to allow ties, you can use window function rank():
select *
from (
select
co.companyID,
ca.categoryID,
count(*) NumberOfMovies,
rank() over(partition by c.categoryID order by count(*) desc) rn
from movie m
inner join categoryFilm cf on cf.movieID = m.movieID
inner join category ca on ca.categoryID = cf.categoryID
inner join company co on co.companyID = m.companyID
group by co.companyID, ca.categoryID
) t
where rn = 1
order by ca.categoryID
This gives you the top company for each and every category, ties included. If you want to filter on a given category, you can just add a where clause to the inner query.
Side note: do not use natural joins: they are error-prone. I rewrote the query to use inner joins instead (I made a few assumptions on the relations).

Find users with most number of common pages they liked

I am trying to find pairs of users who liked the same pages and list the ones who have the most common page likes at the top.
For simplicity I am considering the following table schema
Likes (LikeID, UserID)
LikeDetail (LikeDetailID, LikeID, PageID)
I am trying to find pairs of users with most number of common page likes ordered descending. E.g User1 and User2 have liked 3 pages in common.
I would to have the resulting set of the query to be
UserID1 UserID2 NoOfCommonLikes
2 3 10
4 3 8
1 5 4
I am guessing it would need aggregation, join and aliases however I needed to rename a table twice using AS which did not work for me.
Any tip would be appreciated in MySQL, or SQL Server.
In SQL Server and MySQL 8+, you can use a CTE which JOINs the Likes and LikeDetail table, and then self-JOIN that where PageID is the same but UserID is not, and then grouping on the two userID values:
WITH CTE AS
(SELECT l.UserId, d.PageID
FROM Likes l
JOIN LikeDetail d ON d.LikeID = l.likeID)
SELECT l1.UserId AS UserID1, l2.UserID AS UserID2, COUNT(*) AS NoOfCommonLikes
FROM CTE l1
JOIN CTE l2 ON l2.PageID = l1.PageID AND l2.UserID < l1.UserID
GROUP BY l1.UserID, l2.UserID
ORDER BY COUNT(*) DESC
In versions of MySQL prior to 8.0, you need to repeat the CTE defintion twice in a JOIN to achieve the same result:
SELECT l1.UserId AS UserID1, l2.UserID AS UserID2, COUNT(*) AS NoOfCommonLikes
FROM (SELECT l.UserId, d.PageID
FROM Likes l
JOIN LikeDetail d ON d.LikeID = l.likeID) l1
JOIN (SELECT l.UserId, d.PageID
FROM Likes l
JOIN LikeDetail d ON d.LikeID = l.likeID) l2 ON l2.PageID = l1.PageID AND l2.UserID < l1.UserID
GROUP BY l1.UserID, l2.UserID
ORDER BY COUNT(*) DESC
Note that we use < in the UserID comparison rather than != to avoid getting duplicate rows (e.g. for (UserID1, UserID2) = (1, 2) and (UserID1, UserID2) = (2, 1).
I've made a small demo on dbfiddle which demonstrate the queries.

Query to select random values with inner join on three tables

I have a database with tree tables,
person: id, bio, name
book: id, id_person, title, info
file: id, id_book, location
Other information: Book is about ~50,000 rows, File is about ~ 300,000 rows.
What I'm trying to do is to select 12 different authors and select just one book and from that book select location from the table file.
What I tried is the following:
SELECT DISTINCT(`person`.`id`), `person`.`name`, `book`.`id`, `book`.`title`, `book`.`info`, `file`.`location`
FROM `person`
INNER JOIN `book`
ON `book`.`id_person` = `person`.`id`
INNER JOIN `file`
ON `file`.`id_book` = `book`.`id`
LIMIT 12
I have learned that the DISTINCT does not work the way one might expect. Or is it me that I'm missing something? The above code returns books from the same author and goes with the next one. Which is NOT what I want. I want 1 book from each one of the 12 different authors.
What would be the correct way to retrieve this information from the database? Also, I would want to retrieve 12 random people. Not people that are stored in consecutive order in the database,. I could not formulate any query with rand() since I couldn't even get different authors.
I use MariaDB. And I would appreciate any help, especially help that allows to me do this with great performance.
In MySQL, you can do this, in practice, using GROUP BY
SELECT p.`id`, p.`name`, b.`id`, b.`title`, b.`info`, f.`location`
FROM `person` p INNER JOIN
`book` b
ON b.`id_person` = p.`id` INNER JOIN
`file` f
ON f.id_book = b.id
GROUP BY p.id
ORDER BY rand()
LIMIT 12;
However, this is not guaranteed to return the non-id values from the same row (although it does in practice). And, although the authors are random, the books and locations are not.
The SQL Query to do this consistently is a bit more complicated:
SELECT p.`id`, p.`name`, b.`id`, b.`title`, b.`info`,
(SELECT f.location
FROM file f
WHERE f.id_book = b.id
ORDER BY rand()
LIMIT 1
) as location
FROM (SELECT p.*,
(SELECT b.id
FROM book b
WHERE b.id_person = p.id
ORDER BY rand()
LIMIT 1
) as book_id
FROM person p
ORDER BY rand()
LIMIT 12
) p INNER JOIN
book b
ON b.id = p.book_id ;

mysql advanced group query

I need some help figuring out a query
I have 3 tables
sources
id, name, rank
origin
id, source_id (FK to sources id), name
One source can have many origins
product
id, origin_id (FK to origin id), name, time_added
One origin can have many products
Now, what I want is to select the most recent products per source, ordered by rank descending
Any suggestions?
This should do as you have requested, though without sample output it's hard to be 100% certain. Inner query selects products linked to the source id ordered by the date added from newest to oldest, and in turn that's joined to sources and grouped.
SELECT
*
FROM sources AS s
INNER JOIN (
SELECT
origins.source_id,
product.*
FROM origin
INNER JOIN product
ON product.origin_id = origin.origin_id
ORDER BY time_added DESC
) AS productsOrdered
ON productsOrdered.source_id = sources.source_id
ORDER BY s.rank DESC, productsOrdered.time_added DESC
This avoids having to do potentially expensive opreations as the inner select should be pretty fast and can be limited as required
A typical way of doing this is to
Find the MAX(time_added) for each origin
Get the product's id for each of these origins
Join with the sources and origin tables to retrieve all columns
Note that this fails if there are origins with multiple records with the exact same time_added.
SQL Statement
SELECT *
FROM sources s
INNER JOIN origin o ON o.source_id = s.id
INNER JOIN product p ON p.origin_id = o.id
INNER JOIN (
SELECT id
FROM product p
INNER JOIN (
SELECT origin_id
, MAX(time_added) AS time_addded
FROM product p
GROUP BY
origin_id
) pmax ON pmax.origin_id = p.origin_id
AND pmax.time_added = p.time_added
) pmax ON pmax.id = p.id
SELECT o.id,count(o.id) as numOfProdFromOrig p.id, p.name, p.time_added, s.rank
FROM product as p NATURAL JOIN sources as s NATURAL JOIN origin as o
GROUP BY (numOfProdFromOrig)
ORDER BY s.rank DESC
select b.id,(select p.name from origin o inner join product p
on p.origin_id = o.id where o.source_id = b.id order by time_added desc limit 1)a as product_name
from source b ;
Try this: