I have three tables food, fav_food and food_image. The food table has details about foods, food_image has multiple images for a single food and the
fav_food table has user's favorite food ids.
food:
f_id | description
food_image
f_id | img_url | rank
fav_food
user_id | f_id
Here's what I tried:
SELECT food.f_id,
food.description,
img.minimgrank,
i.img_url AS profile_photo
FROM fav_food
INNER JOIN food
ON fav_food.f_id = food.f_id
LEFT OUTER JOIN(SELECT f_id,
Min(rank) AS minImgRank
FROM food_image
GROUP BY f_id) img
ON img.f_id = food.f_id
JOIN food_image i
ON i.rank = img.minimgrank
WHERE fav_food.user_id = ?
Now I need a query that will show the favorite foods of user with description and image. Although there is multiple images, I need to select a single image with minimum rank (Suppose rank- 1,2,3 the the image with rank 1 will be selected). So my question is how to write a faster query to achieve my goal?
Consider joining the three tables and after that using a window function such as min(rank) to get the results.
select *
from (
select a.user_id
,b.img_url
,b.rank
,min(b.rank) over(order by b.rank asc) as rnk
from fav_food a
join food_image b
on a.f_id=b.f_id
join food c
on a.f_id=c.f_id
)x
where x.rank=x.rnk
This is a situation where a correlated subquery should have better performance:
SELECT f.f_id, f.description,
fi.rank, fi.img_url AS profile_photo
FROM fav_food ff JOIN
food f
ON ff.f_id = f.f_id LEFT JOIN
food_image fi
ON fi.f_id = f.f_id AND
fi.rank = (SELECT MIN(fi2.rank) FROM food_image fi2 WHERE fi2.f_id = fi.f_id)
WHERE ff.user_id = ? ;
For performance, you want to be sure that you have an index on food_image(f_id, rank), and fav_food(user_id, f_id), as well as indexes on the primary keys.
Why is this faster and than the GROUP BY version? First, indexes can be used for the correlated query, but probably won't be for the JOIN after the GROUP BY.
Second, the GROUP BY needs to process all the data in the images table. This only needs to process the favorite foods for the given user.
Finally, ROW_NUMBER() is an option in MySQL 8+. However, this is still likely to be as fast or faster than that solution (based on my experience with other databases).
Related
I am using MySQL 5.6.
I have a SQL table with a list of users:
id name
1 Alice
2 Bob
3 John
and a SQL table with the list of gifts for each user (numbered in order of preference):
id gift rank
1 balloon 2
1 shoes 1
1 seeds 3
1 video-game 1
2 computer 2
3 shoes 2
3 hat 1
And I would like a list of the preferred gift for each user (the highest rank - if two gifts have the same rank, pick only one randomly) (bonus: if the list could be randomized, that would be perfect!):
id name gift rank
2 Bob computer 2
1 Alice shoes 1
3 John hat 1
I tried to use the clause GROUP BY but without any success.
Considering rank as a part of your data; Without using window functions or complex sub queries
SELECT u.id, u.name, g.gift
FROM users u
JOIN gifts g ON g.id = u.id
LEFT JOIN gifts g2 ON g2.id = g.id AND g2.rank > g.rank
WHERE g2.id IS NULL;
Added link http://sqlfiddle.com/#!9/62f59e/15/0
You can use row_number to get one row for each User.(Mysql 8.0+)
SELECT A.ID,NAME,GIFT,`RANK` FROM USERS A
LEFT JOIN (
SELECT ID,GIFT,`RANK` FROM
(SELECT *,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY `RANK` ASC) AS RN FROM X) X
WHERE RN =1
) B
ON A.ID= B.ID
I do not know DB what you use. And I'm not an expert in SQL(I can have some mistake in next). But I think it is not difficult.
So I can give you just advice that you have to think gradually. Let me write.
First All I need is the highest rank. So I have to get this.
SELECT MAX(RANK)
FROM GIFT
GROUP BY ID
And then I think that I need get gifts from this rank.
SELECT GIFT.*
FROM GIFT
INNER JOIN(
SELECT ID, MAX(RANK)
FROM GIFT
GROUP BY ID
) filter ON GIFT.ID = filter.ID AND GIFT.RANK = filter.RANK
I think this is the table what you want!
So If below code works, That's what you really want.
SELECT *
FROM USER
LEFT OUTER JOIN(
above table
) GIFT ON USER.ID = GIFT.ID
But Remember this, I said I'm not an expert in SQL. There can be better way.
Checkout the query
SELECT tbluser.id,name,gift,rank into tblrslt
FROM tbluser
LEFT JOIN tblgifts
ON tbluser.id = tblgifts.id order by id,rank;
SELECT tt.*
FROM tblrslt tt
INNER JOIN
(SELECT id, min(rank) AS rank
FROM tblrslt
GROUP BY id) groupedtt
ON tt.id = groupedtt.id
AND tt.rank = groupedtt.rank order by id
In MySQL versions older than 8 you have no ranking functions available. You'll select the minimum rank per user instead and use these ranks to select the gift rows. This means you access the gifts table twice.
I suggest this:
select *
fron users u
join gifts g
on g.id = u.id
and (g.id, g.rank) in (select id, min(rank) from gifts group by id)
order by u.id;
If you also want to show users without gifts, simply change the inner join to a left outer join.
I'm having trouble figuring out how to structure a SQL query. Let's say we have a User table and a Pet table. Each user can have many pets and Pet has a breed column.
User:
id | name
______|________________
1 | Foo
2 | Bar
Pet:
id | owner_id | name | breed |
______|________________|____________|_____________|
1 | 1 | Fido | poodle |
2 | 2 | Fluffy | siamese |
The end goal is to provide a query that will give me all the pets for each user that match the given where clause while allowing sort and limit parameters to be used. So the ability to limit each user's pets to say 5 and sorted by name.
I'm working on building these queries dynamically for an ORM so I need a solution that works in MySQL and Postgresql (though it can be two different queries).
I've tried something like this which doesn't work:
SELECT "user"."id", "user"."name", "pet"."id", "pet"."owner_id", "pet"."name",
"pet"."breed"
FROM "user"
LEFT JOIN "pet" ON "user"."id" = "pet"."owner_id"
WHERE "pet"."id" IN
(SELECT "pet"."id" FROM "pet" WHERE "pet"."breed" = 'poodle' LIMIT 5)
In Postgres (8.4 or later), use the window function row_number() in a subquery:
SELECT user_id, user_name, pet_id, owner_id, pet_name, breed
FROM (
SELECT u.id AS user_id, u.name AS user_name
, p.id AS pet_id, owner_id, p.name AS pet_name, breed
, row_number() OVER (PARTITION BY u.id ORDER BY p.name, pet_id) AS rn
FROM "user" u
LEFT JOIN pet p ON p.owner_id = u.id
AND p.breed = 'poodle'
) sub
WHERE rn <= 5
ORDER BY user_name, user_id, pet_name, pet_id;
When using a LEFT JOIN, you can't combine that with WHERE conditions on the left table. That forcibly converts the LEFT JOIN to a plain [INNER] JOIN (and possibly removes rows from the result you did not want removed). Pull such conditions up into the join clause.
The way I have it, users without pets are included in the result - as opposed to your query stub.
The additional id columns in the ORDER BY clauses are supposed to break possible ties between non-unique names.
Never use a reserved word like user as identifier.
Work on your naming convention. id or name are terrible, non-descriptive choices, even if some ORMs suggest this nonsense. As you can see in the query, it leads to complications when joining a couple of tables, which is what you do in SQL.
Should be something like pet_id, pet, user_id, username etc. to begin with.
With a proper naming convention we could just SELECT * in the subquery.
MySQL does not support window functions, there are fidgety substitutes ...
SELECT user.id, user.name, pet.id, pet.name, pet.breed, pet.owner_id,
SUBSTRING_INDEX(group_concat(pet.owner_id order by pet.owner_id DESC), ',', 5)
FROM user
LEFT JOIN pet on user.id = pet.owner_id GROUP BY user.id
Above is rough/untested, but this source has exactly what you need, see step 4. also you don't need any of those " 's.
This query suggests friendship based on how many words users have in common. in_common sets this threshold.
I was wondering if it was possible to make this query completely % based.
What I want to do is have user suggested to current user, if 30% of their words match.
curent_user total words 100
in_common threshold 30
some_other_user total words 10
3 out of these match current_users list.
Since 3 is 30% of 10, this is a match for the current user.
Possible?
SELECT users.name_surname, users.avatar, t1.qty, GROUP_CONCAT(words_en.word) AS in_common, (users.id) AS friend_request_id
FROM (
SELECT c2.user_id, COUNT(*) AS qty
FROM `connections` c1
JOIN `connections` c2
ON c1.user_id <> c2.user_id
AND c1.word_id = c2.word_id
WHERE c1.user_id = :user_id
GROUP BY c2.user_id
HAVING count(*) >= :in_common) as t1
JOIN users
ON t1.user_id = users.id
JOIN connections
ON connections.user_id = t1.user_id
JOIN words_en
ON words_en.id = connections.word_id
WHERE EXISTS(SELECT *
FROM connections
WHERE connections.user_id = :user_id
AND connections.word_id = words_en.id)
GROUP BY users.id, users.name_surname, users.avatar, t1.qty
ORDER BY t1.qty DESC, users.name_surname ASC
SQL fiddle: http://www.sqlfiddle.com/#!2/c79a6/9
OK, so the issue is "users in common" defined as asymmetric relation. To fix it, let's assume that in_common percentage threshold is checked against user with the least words.
Try this query (fiddle), it gives you full list of users with at least 1 word in common, marking friendship suggestions:
SELECT user1_id, user2_id, user1_wc, user2_wc,
count(*) AS common_wc, count(*) / least(user1_wc, user2_wc) AS common_wc_pct,
CASE WHEN count(*) / least(user1_wc, user2_wc) > 0.7 THEN 1 ELSE 0 END AS frienship_suggestion
FROM (
SELECT u1.user_id AS user1_id, u2.user_id AS user2_id,
u1.word_count AS user1_wc, u2.word_count AS user2_wc,
c1.word_id AS word1_id, c2.word_id AS word2_id
FROM connections c1
JOIN connections c2 ON (c1.user_id < c2.user_id AND c1.word_id = c2.word_id)
JOIN (SELECT user_id, count(*) AS word_count
FROM connections
GROUP BY user_id) u1 ON (c1.user_id = u1.user_id)
JOIN (SELECT user_id, count(*) AS word_count
FROM connections
GROUP BY user_id) u2 ON (c2.user_id = u2.user_id)
) AS shared_words
GROUP BY user1_id, user2_id, user1_wc, user2_wc;
Friendship_suggestion is on SELECT for clarity, you probably need to filter by it, so yu may just move it to HAVING clause.
I throw this option into your querying consideration... The first part of the from query is to do nothing but get the one user you are considering as the basis to find all others having common words. The where clause is for that one user (alias result OnePerson).
Then, add to the from clause (WITHOUT A JOIN) since the OnePerson record will always be a single record, we want it's total word count available, but didn't actually see how your worked your 100 to 30 threashold if another person only had 10 words to match 3... I actually think its bloat and unnecessary as you'll see later in the where of PreQuery.
So, the next table is the connections table (aliased c2) and that is normal INNER JOIN to the words table for each of the "other" people being considered.
This c2 is then joined again to the connections table again alias OnesWords based on the common word Id -- AND -- the OnesWords user ID is that of the primary user_id being compared against. This OnesWords alias is joined to the words table so IF THERE IS a match to the primary person, we can grab that "common word" as part of the group_concat().
So, now we grab the original single person's total words (still not SURE you need it), a count of ALL the words for the other person, and a count (via sum/case when) of all words that ARE IN COMMON with the original person grouped by the "other" user ID. This gets them all and results as alias "PreQuery".
Now, from that, we can join that to the user's table to get the name and avatar along with respective counts and common words, but apply the WHERE clause based on the total per "other users" available words to the "in common" with the first person's words (see... I didn't think you NEEDED the original query/count as basis of percentage consideration).
SELECT
u.name_surname,
u.avatar,
PreQuery.*
from
( SELECT
c2.user_id,
One.TotalWords,
COUNT(*) as OtherUserWords,
GROUP_CONCAT(words_en.word) AS InCommonWords,
SUM( case when OnesWords.word_id IS NULL then 0 else 1 end ) as InCommonWithOne
from
( SELECT c1.user_id,
COUNT(*) AS TotalWords
from
`connections` c1
where
c1.user_id = :PrimaryPersonBasis ) OnePerson,
`connections` c2
LEFT JOIN `connections` OnesWords
ON c2.word_id = OnesWords.word_id
AND OnesWords.user_id = OnePerson.User_ID
LEFT JOIN words_en
ON OnesWords.word_id = words_en.id
where
c2.user_id <> OnePerson.User_ID
group by
c2.user_id ) PreQuery
JOIN users u
ON PreQuery.user_id = u.id
where
PreQuery.OtherUserWords * :nPercentToConsider >= PreQuery.InCommonWithOne
order by
PreQuery.InCommonWithOne DESC,
u.name_surname
Here's a revised WITHOUT then need to prequery the total original words of the first person.
SELECT
u.name_surname,
u.avatar,
PreQuery.*
from
( SELECT
c2.user_id,
COUNT(*) as OtherUserWords,
GROUP_CONCAT(words_en.word) AS InCommonWords,
SUM( case when OnesWords.word_id IS NULL then 0 else 1 end ) as InCommonWithOne
from
`connections` c2
LEFT JOIN `connections` OnesWords
ON c2.word_id = OnesWords.word_id
AND OnesWords.user_id = :PrimaryPersonBasis
LEFT JOIN words_en
ON OnesWords.word_id = words_en.id
where
c2.user_id <> :PrimaryPersonBasis
group by
c2.user_id
having
COUNT(*) * :nPercentToConsider >=
SUM( case when OnesWords.word_id IS NULL then 0 else 1 end ) ) PreQuery
JOIN users u
ON PreQuery.user_id = u.id
order by
PreQuery.InCommonWithOne DESC,
u.name_surname
There might be some tweaking on the query, but your original query leads me to believe you can easily find simple things like alias or field name type-o instances.
Another options might be to prequery ALL users and how many respective words they have UP FRONT, then use the primary person's words to compare to anyone else explicitly ON those common words... This might be more efficient as the multiple joins would be better on the smaller result set. What if you have 10,000 users and user A has 30 words, and only 500 other users have one or more of those words in common... why compare against all 10,000... but if having up-front a simple summary of each user and how many should be an almost instant query basis.
SELECT
u.name_surname,
u.avatar,
PreQuery.*
from
( SELECT
OtherUser.User_ID,
AllUsers.EachUserWords,
COUNT(*) as CommonWordsCount,
group_concat( words_en.word ) as InCommonWords
from
`connections` OneUser
JOIN words_en
ON OneUser.word_id = words_en.id
JOIN `connections` OtherUser
ON OneUser.word_id = OtherUser.word_id
AND OneUser.user_id <> OtherUser.user_id
JOIN ( SELECT
c1.user_id,
COUNT(*) as EachUserWords
from
`connections` c1
group by
c1.user_id ) AllUsers
ON OtherUser.user_id = AllUsers.User_ID
where
OneUser.user_id = :nPrimaryUserToConsider
group by
OtherUser.User_id,
AllUsers.EachUserWords ) as PreQuery
JOIN users u
ON PreQuery.uer_id = u.id
where
PreQuery.EachUserWords * :nPercentToConsider >= PreQuery.CommonWordCount
order by
PreQuery.CommonWordCount DESC,
u.name_surname
May I suggest a different way to look at your problem?
You might look into a similarity metric, such as Cosine Similarity which will give you a much better measure of similarity between your users based on words. To understand it for your case, consider the following example. You have a vector of words A = {house, car, burger, sun} for a user u1 and another vector B = {flat, car, pizza, burger, cloud} for user u2.
Given these individual vectors you first construct another that positions them together so you can map to each user whether he/she has that word in its vector or not. Like so:
| -- | house | car | burger | sun | flat | pizza | cloud |
----------------------------------------------------------
| A | 1 | 1 | 1 | 1 | 0 | 0 | 0 |
----------------------------------------------------------
| B | 0 | 1 | 1 | 0 | 1 | 1 | 1 |
----------------------------------------------------------
Now you have a vector for each user where each position corresponds to the value of each word to each user. Here it represents a simple count but you can improve it using different metrics based on word frequency if that applies to your case. Take a look at the most common one, called tf-idf.
Having these two vectors, you can compute the cosine similarity between them as follows:
Which basically is computing the sum of the product between each position of the vectors above, divided by their corresponding magnitude. In our example, that is 0.47, in a range that can vary between 0 and 1, the higher the most similar the two vectors are.
If you choose to go this way, you don't need to do this calculation in the database. You compute the similarity in your code and just save the result in the database. There are several libraries that can do that for you. In Python, take a look at the numpy library. In Java, look at Weka and/or Apache Lucene.
I have looked pretty hard through other SQL query questions but have been unable to narrow down a response that seems to work in my case... so here goes.
I have two mySQL tables:
players:
pid
pname
player_stats:
pid
statdate
rank
score
I would like to show output of the players name and their latest score and rank like this:
player1 | rank10 | 123345
player2 | rank15 | 480993
I have played around with max(statdate) and GROUP BY on pname but the closest I have gotten is below, which gives me the right number of rows but not the latest date (thus not the latest rank or score).
SELECT p.pname, s.rank, s.score
FROM players p INNER JOIN player_stats s ON p.pid = s.pid
GROUP BY p.pname
as mentioned this is close but the rank/score are not always the last date's
You can have an extra join in a subquery which separately get the latest date for every pid.
The subquery contains only two columns: pid and statdate. To get the other columns you need to join it with the other table.
SELECT a.pname, b.rank, b.score
FROM players a
INNER JOIN player_stats b
ON a.pid = b.pid
INNER JOIN
(
SELECT pid, MAX(statdate) statdate
FROM player_stats
GROUP BY pid
) c ON b.pid = c.pid
AND b.statdate = c.statdate
I suppose this is a very common question and possibly is duplicated, but I didn't know how to search it. This is my scenario: I have 3 tables: registers, foods and recipes. Tables structure simplified:
Registers
id | item_id | type (0 food|1 recipe)
(item_id can be whatever a food id or recipe id)
Foods
id | name | weight
Recipes
id | name | num_ingredients
What I want is to query the registers table and return the food or recipe name (and some specific fields from each table, like weight in foods and num_ingredients in recipes) based on the type field being 0 or 1.
If the tables structure is not the best, I would thank to help me find the best solution.
Thanks a lot in advance.
You need to combine the two detail tables with a UNION, and then JOIN them with the Registers table using the ID and type fields.
SELECT item_id, name, data
FROM Registers r
JOIN (SELECT 0 type, id, name, weight as data
FROM Foods
UNION
SELECT 1 type, id, name, num_ingredients as data) x
ON r.item_id = x.id AND r.type = x.type
GolezTroi's answer where he does the UNION after JOIN is probably better, though, since it doesn't create a large temporary table before joining.
You can do this like so:
select r.id, r.type, f.name
from
Registers r
inner join Foods f on f.id = r.item_id and r.type = 0
union all
select r.id, r.type, c.name
from
Registers r
inner join Recipes c on c.id = r.item_id and r.type = 1
But your table structure isn't ideal. First of all, since r.item_id can contain an id from two tables, it is impossible to add a constraint that enforces referential integrity. You'll need a trigger to check it, which is more complex and slower.
Instead, I'd choose to make the relation the other way around:
Add a register_id to Foods and Recipes. Then, you can write your query like this:
select r.id, r.type, f.name
from
Registers r
inner join Foods f on f.register_id = r.id
union all
select r.id, r.type, c.name
from
Registers r
inner join Recipes c on c.register_id = r.id
That's almost the same, but you don't need the type and it allows you to make proper foreign key constraints.