How to use "group by" before "join" on mysql? - mysql

I have the following query which contains duplicate user_id's. I don't want to see a user more than one. I decided to use group by as in the following:
SELECT u.`ID`, u.`user_login`, u.`user_registered`, u.`display_name`
FROM purchase_key p
LEFT JOIN users u ON p.user_id = u.id
WHERE ( `product_id` = 1 OR `product_id` = 2 )
AND `create_date` <= '2015-09-20' AND `create_date` >= '2014-09-01'
group by p.user_id order by p.`create_date` asc
But is there a way to do that grouping before join ?

Not sure why you want to do this. Unlikely that it will help with performance unless the indexes are very poor.
Not sure the LEFT OUTER JOIN is useful, as the only fields your query returns are ones from the left joined table. Hence if no user were found for some purchases you would just have a load of null rows returned.
But doing the sub query to get the list of users, with the max create date for that user within the required date range (while not strictly necessary for MySQL most flavours of SQL would give an error if you return fields that are not in the group by clause and not aggregate fields), and using an INNER JOIN gives you the following:-
SELECT u.ID, u.user_login, u.user_registered, u.display_name
FROM
(
SELECT user_id, MAX(create_date) AS max_create_date
FROM purchase_key
WHERE product_id IN (1, 2)
AND create_date BETWEEN '2014-09-01' AND '2015-09-20'
GROUP BY user_id
) p
INNER JOIN users u ON p.user_id = u.id
ORDER BY p.max_create_date ASC

SELECT u.`ID`, u.`user_login`, u.`user_registered`, u.`display_name`
FROM (SELECT * FROM purchase_key GROUP BY user_id ) p
LEFT JOIN users u ON p.user_id = u.id
WHERE ( `product_id` = 1 OR `product_id` = 2 )
AND `create_date` <= '2015-09-20' AND `create_date` >= '2014-09-01'
order by p.`create_date` asc
Hope this helps

If you don't want to see user more than once, try to use DISTINCT in your first line.

Related

mysql select doesn't include data which is compared by current timestamp

Learning to write sql queries and facing some issues.
The query:
SELECT *
FROM competitions c
INNER JOIN (
SELECT competitionId, SUM(quantity) AS quantity
FROM tickets
GROUP BY competitionId
) t ON t.competitionId = c.id
WHERE
c.winnerId IS NULL
AND t.quantity = c.maxEntries
OR
CURRENT_TIMESTAMP() >= c.endAt;
This returns only one results but should return two results.
CURRENT_TIMESTAMP() >= c.endAt;
Should also include one more record but this statement is skipped for whatever reason...
The competition table records:
The record which is marked in red square do match to the where condition. Why then it wouldn't be included?
Changing from INNER JOIN to LEFT JOIN gives desired results.
The query:
SELECT *
FROM competitions c
LEFT JOIN
(SELECT competitionId, SUM(quantity) AS quantity
FROM tickets GROUP BY competitionId
) t ON t.competitionId = c.id
WHERE
c.winnerId IS NULL AND
(t.quantity=c.maxEntries OR CURRENT_TIMESTAMP() >= c.endAt)

Mysql count result in "where" clause

I'm facing a little problem with mysql where clause.
This is the query:
SELECT u.id user
, p.id product_purchased
, p.name product_name
, pl.store_id store
, COUNT(*) occurrences
, total_spent
, total_product_purchased
, pl.registration
FROM purchases_log pl
JOIN user u
ON pl.user_id = u.id
JOIN product p
ON pl.product_id = p.id
JOIN
( SELECT user_id
, SUM(price) total_spent
, COUNT(product_id) total_product_purchased
FROM purchases_log pl
GROUP
BY user_id
) t1
ON u.id = t1.user_id
WHERE pl.store_id IN (1,2,3)
AND occurrences > 1
GROUP
BY user
, product_name
ORDER
BY u.id ASC
, pl.registration ASC;
This is the output error:
Error Code: 1054. Unknown column 'occurrences' in 'where clause' 0.067 sec
I have already tried assign AS to occurrences or using pl.
So, can someone explain me how to correctly define the result of a count function in where clause?
You need to use HAVING instead of COUNT as group by is applied after WHERE clause and hence, it won't know about any group/aggregate columns, e.g/:
SELECT u.id user,p.id product_purchased, p.name product_name, pl.store_id store, COUNT(*) AS occurrences, total_spent, total_product_purchased, pl.registration
FROM purchases_log pl
JOIN user u ON pl.user_id=u.id
JOIN product p ON pl.product_id=p.id
JOIN (SELECT user_id, SUM(price) AS total_spent,COUNT(product_id) AS total_product_purchased FROM purchases_log pl GROUP BY user_id) t1 ON u.id=t1.user_id
WHERE pl.store_id IN (1,2,3)
GROUP BY user, product_name
HAVING COUNT(*) > 1
ORDER BY u.id ASC, pl.registration ASC;
Update
If a user has more than one product associated then it's good to add all the non aggregate columns in GROUP BY to get all the combinations of user and product. The current query will not return all the combinations.
For further optimization, as #strawberry has suggest, you can run EXPLAIN and see which indices are used and whether there is any need to create any new index.

Count the number of times an index exists using a query containing multiple EXIST() statements

My query gets the results of these products based on if they exist in a separate table index. I am trying to get a count of all the instances where they exist so I can ORDER the results by relevance. Everything I try seems to return the variable #priority as 0. Any ideas?
Maybe it is better to use join statements?
Thank you for your help. Here is my MySQL query:
SELECT `products` . * , #priority
FROM `products`
LEFT JOIN productstypes_index ON productstypes_index.product_id = products.id
WHERE (
EXISTS (
SELECT *
FROM `productstypes_index`
WHERE `productstypes_index`.`product_id` = `products`.`id`
AND `productstypes_index`.`_type_id` = '1'
)
AND (
(
(
EXISTS (
SELECT #priority := COUNT( * )
FROM `producthashtags_index`
WHERE `producthashtags_index`.`product_id` = `products`.`id`
AND `producthashtags_index`.`producthashtag_id` = '43'
)
)
AND (
EXISTS (
SELECT #priority := COUNT( * )
FROM `producthashtags_index`
WHERE `producthashtags_index`.`product_id` = `products`.`id`
AND `producthashtags_index`.`producthashtag_id` = '11'
)
)
)
)
)
ORDER BY `updated_at` DESC;
You could do without those exists, and without variables. Also, a left join has no sense if you have an exists condition on the joined table. Then you might as well do the more efficient inner join and put the extra type condition in the join condition.
The priority can be calculated by a count over the hash tags, but only those with id in ('43', '11').
SELECT products.*
count(distinct producthashtags_index.producthashtag_id) priority
FROM products
INNER JOIN productstypes_index
ON productstypes_index.product_id = products.id
AND productstypes_index._type_id = '1'
INNER JOIN producthashtags_index
ON producthashtags_index.product_id = products.id
AND producthashtags_index.producthashtag_id in ('43', '11')
GROUP BY products.id
ORDER BY updated_at DESC;
MySQL ignores the SELECT list in EXISTS subquery, so it makes no difference what you type in there. This is documented here.
An approach using joins would look like below:
SELECT p.id,
COUNT(case when phi.product_id is not null then 1 end) AS instances
FROM products p
INNER JOIN productstypes_index pti ON pti.product_id = p.id AND pti.`_type_id` = 1
LEFT JOIN producthashtags_index phi ON phi.product_id = p.id AND phi.producthashtag_id IN (11,43)
GROUP BY p.id
ORDER BY instances DESC;
I have removed additional backticks where I believe they are not neccessary and also if your id columns in tables are integers, you do not need quotation marks.

MySql order by clause not working

In mysql query I use order by, but it is not working.
When I do this
SELECT t.id,t.user_id,t.title,c.comment,d.has_answer,IF(c.id IS NULL, t.date_created, d.recent_date) recent_date,MIN(i.id) image_id
FROM threads t
LEFT JOIN comments c ON c.thread_id = t.id
INNER JOIN (
SELECT thread_id, MAX(date_sent) recent_date, MAX(is_answer) has_answer
FROM comments
GROUP BY thread_id
) d ON c.id IS NULL OR (d.thread_id = c.thread_id AND d.recent_date = c.date_sent)
LEFT JOIN thread_images i ON t.id = i.thread_id
WHERE t.user_id = t.user_id
GROUP BY t.id
ORDER BY d.recent_date DESC
LIMIT 0, 10
It doesn't properly order them. But if I do this:
SELECT *
FROM (
SELECT t.id,t.user_id,t.title,c.comment,d.has_answer,IF(c.id IS NULL, t.date_created, d.recent_date) recent_date,MIN(i.id) image_id
FROM threads t
LEFT JOIN comments c ON c.thread_id = t.id
INNER JOIN (
SELECT thread_id, MAX(date_sent) recent_date, MAX(is_answer) has_answer
FROM comments
GROUP BY thread_id
) d ON c.id IS NULL OR (d.thread_id = c.thread_id AND d.recent_date = c.date_sent)
LEFT JOIN thread_images i ON t.id = i.thread_id
WHERE t.user_id = t.user_id
GROUP BY t.id
LIMIT 0, 10) qwerty
ORDER BY recent_date DESC
Then it does work. Why does the top one not work, and is the second way the best way to fix that?
Thanks
Those two statements are ordering by two different things.
The second statement is ordering by the result of an expression in the SELECT list.
But the first statement specifies ordering by a value of recent_date returned by the inline view d; if you remove "d." from in front of recent_date, then the ORDER BY clause would reference the alias assigned to the expression in the SELECT list, as the second statement does.
Because recent_date is an alias for an expression the SELECT list, these two are equivalent:
ORDER BY recent_date
ORDER BY IF(c.id IS NULL, t.date_created, d.recent_date)
^^
but those are significantly different from:
ORDER BY d.recent_date
^^
Note that the non-standard use of the GROUP BY clause may be masking some values of recent_date which are discarded by the query. This usage of the GROUP BY clause is a MySQL extension to the SQL Standard; most other relational databases would throw an error with this statement. It's possible to get MySQL to throw the same type of error by enabling the ONLY_FULL_GROUP_BY SQL mode.
Q Is the second statement the best way to fix that?
A If that statement guarantees that the resultset returned meets your specification, then it's a workable approach. (One downside is the overhead of the inline view query.)
But I strongly suspect that the second statement is really just masking the problem, not really fixing it.
SELECT t.id,t.user_id,t.title,c.comment,d.has_answer,IF(c.id IS NULL, t.date_created, d.recent_date) recent_date,MIN(i.id) image_id
FROM (threads t
LEFT JOIN comments c ON c.thread_id = t.id
INNER JOIN (
SELECT thread_id, MAX(date_sent) recent_date, MAX(is_answer) has_answer
FROM comments
GROUP BY thread_id
) d ON c.id IS NULL OR (d.thread_id = c.thread_id AND d.recent_date = c.date_sent)
LEFT JOIN thread_images i ON t.id = i.thread_id
WHERE t.user_id = t.user_id
GROUP BY t.id
LIMIT 0, 10) x
ORDER BY d.recent_date DESC

SQL query to check if value doesn't exist in another table

I have a SQL query which does most of what I need it to do but I'm running into a problem.
There are 3 tables in total. entries, entry_meta and votes.
I need to get an entire row from entries when competition_id = 420 in the entry_meta table and the ID either doesn't exist in votes or it does exist but the user_id column value isn't 1.
Here's the query I'm using:
SELECT entries.* FROM entries
INNER JOIN entry_meta ON (entries.ID = entry_meta.entry_id)
WHERE 1=1
AND ( ( entry_meta.meta_key = 'competition_id' AND CAST(entry_meta.meta_value AS CHAR) = '420') )
GROUP BY entries.ID
ORDER BY entries.submission_date DESC
LIMIT 0, 25;
The votes table has 4 columns. vote_id, entry_id, user_id, value.
One option I was thinking of was to SELECT entry_id FROM votes WHERE user_id = 1 and include it in an AND clause in my query. Is this acceptable/efficient?
E.g.
AND entries.ID NOT IN (SELECT entry_id FROM votes WHERE user_id = 1)
A left join with an appropriate where clause might be useful:
SELECT
entries.*
FROM
entries
INNER JOIN entry_meta ON (entries.ID = entry_meta.entry_id)
LEFT JOIN votes ON entries.ID = votes.entry_id
WHERE 1=1
AND (
entry_meta.meta_key = 'competition_id'
AND CAST(entry_meta.meta_value AS CHAR) = '420')
AND votes.entry_id IS NULL -- This will remove any entry with votes
)
GROUP BY entries.ID
ORDER BY entries.submission_date DESC
Here's an implementation of Andrew's suggestion to use exists / not exists.
select
e.*
from
entries e
join entry_meta em on e.ID = em.entry_id
where
em.meta_key = 'competition_id'
and cast(em.meta_value as char) = '420'
and (
not exists (
select 1
from votes v
where
v.entry_id = e.ID
)
or exists (
select 1
from votes v
where
v.entry_id = e.ID
and v.user_id != 1
)
)
group by e.ID
order by e.submission_date desc
limit 0, 25;
Note: it's generally not a good idea to put a function inside a where clause (due to performance reasons), but since you're also joining on IDs you should be OK.
Also, The left join suggestion by Barranka may cause the query to return more rows than your are expecting (assuming that there is a 1:many relationship between entries and votes).