If I run the following query:
SELECT *
FROM `smp_data_log`
WHERE Post_id = 1234 AND Account_id = 1306
ORDER BY Created_time DESC
I get 7 rows back including entries with the following Created_times:
1) 1424134801
2) 1424134801
3) 1421802001
4) 3601
If I run the following query:
SELECT mytable.*
FROM (SELECT * FROM `smp_data_log` ORDER BY Created_time DESC) AS mytable
WHERE Post_id = 1234 AND Account_id = 1306
GROUP BY Post_id
I am would expect to see 1424134801 come back as a single row - but instead I am seeing 3601?? I would have thought this would have returned the latest time (as its descending). What am I doing wrong?
Your expectation is wrong. And this is well documented in MySQL. You are using an extension, where you have columns in the select that are not in the group by -- a very bad habit and one that doesn't work in other databases (except in some very special circumstances allowed by the ANSI standard).
Just use join to get what you really want:
SELECT l.*
FROM smp_data_log l JOIN
(select post_id, max(created_time) as maxct
from smp_data_log
group by post_id
) lmax
on lmax.post_id = l.post_id and lmax.maxct = l.created_time;
Here is the quote from the documentation:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
Related
I have 2 tables:
F_Test
and d_partners
I need to provide to every “site_name” the top 5 “partner_name” (by join with dim_partners) with the highest number of clicks (every record with value under the “partner_id” it’s one click).
This is my query:
select t.partner_name, t.partner_id
from F_Test as t, d_partners as t2
where t. partner_id =t2.partner_id
GROUP BY t.site_name
Order by desc limit 5
Do you think it's fine? What should I change?
Here are the parts of the tables:
F_Test table
d_partners table
I would expect a query like this:
select sp.*
from (select t.site_name, p.partner_name, count(*) as num_clicks,
row_number() over (partition by t.site_name order by count(*) desc) as seqnum
from F_Test t join
d_partners p
on p.partner_id = t.partner_id
group by t.site_name, p.partner_name
) sp
where seqnum <= 5;
Notes:
The query uses proper, explicit, standard, readable JOIN syntax.
As a corollary: Never use commas in the FROM clause.
Use meaningful table aliases -- abbreviations for the table names -- rather than meaningless aliases.
Window functions have been available in MySQL for several years, starting with version 8.0.
I have a main table named tblorder.
It contains CUID(Customer ID), CuName(Customer Name) and OrDate(Order Date) that I care about.
It is currently ordered by date in ascending order(ex. 2001 before 2002).
Objective:
Trying to retrieve most recent 1 Million DISTINCT Customer's CUID and CuNameS, and Insert them Into a Tempdb(#Recent1M) for Later Joining Uses.
So I:
Would Need Order By desc to flip the date to retrieve most recent 1 Million Customers
Only want first 1 Million DISTINCT Customer Information(CUID, CuName)
I know following code is not correct, but it is the main idea. I just can't figure out the correct syntax. So far I have the While Loop with Select Into as the most plausible solution.
SQL Platform: SSMS
Declare #DC integer
Set #DC = Count(distinct(CUID)) from #Recent1M))
While (#DC <1000000)
Begin
Select CuID,CuName into #Recent1MCus from tblorder
End
Thank you very much, I appreciate any help!
TOP 1000000 is the way to go, but you're going to need an ORDER BY clause or you will get arbitrary results. In your case, you mentioned that you wanted the most recent ones, so:
ORDER BY OrderDate DESC
Also, you might consider using GROUP BY rather than DISTINCT. I think it looks cleaner and keeps the select list a select list so you have the option to include whatever else you might want (as I took the liberty of doing). Notice that, because of the grouping, the ORDER BY now uses MAX(ordate) since customers can presumably have multiple ordate's and we are interested in the most recent. So:
select top 1000000 cuid, cuname, sum(order_value) as ca_ching, count(distinct(order_id)) as order_count
into #Recent1MCus
from tblorder
group by cuid, cuname
order by max(ordate) desc
I hope this helps.
Wouldn't you just do this?
select distinct top 1000000 cuid, cuname
into #Recent1MCus
from tblorder;
If the names might not be distinct, you can do:
select top 1000000 cuid, cuname
into #Recent1MCus
from (select o.*, row_number() over (partition by cuid order by ordate desc) as seqnum
from tblorder o
) o
where seqnum = 1;
Use DISTINCT and ORDER BY <colname> DESC to get latest unique records.
Try this SQL query:
SELECT DISTINCT top 1000000
cuid,
cuname
INTO #Recent1MCus
FROM tblorder
ORDER BY OrDate DESC;
This is my code :
SELECT *
FROM Event_list
WHERE interest in
(
SELECT Interest_name
from Interest
where Interest_id in
(
SELECT Interest_id
FROM `User's Interests`
where P_id=Pid and is_canceled=0
)
)
order by count(Eid) desc
I don't use any GROUP BY clause but still only get one row. when removing the ORDER BY clause I get all the correct rows (but not in the right order).
I'm trying to return a view (named Event_list) sorted by most common Eid (Event id), but I want to see every row without any grouping.
COUNT() is a group function, so using it will automatically result in grouping of rows. This is why you get only one row in your result when you use it in your ORDER BY clause.
Unfortunately, it's not clear what you're trying to do, so I can't tell you how to rewrite your query to get your desired results.
I suspect the query you want is more like this:
SELECT el.*,
(select count(*)
from interest i join
UserInterests ui
on ui.is_canceled = 0 and ui.p_id = i.id
where el.interest = i.interest_name
) as cnt
FROM Event_list el
ORDER BY cnt desc;
It is a bit hard to tell without sample data and a better formed query. Some notes:
Don't use special characters in table and column names. Having to escape the names merely leads to queries that are harder to read, write, and understand.
Qualify column names, so you know what tables columns come from.
Use table aliases -- so queries are easier to write and to read.
The WHERE clause only does filtering. Your description of the problem doesn't seem to involve filtering, only ordering.
Any time you use an aggregation function, the query automatically becomes an aggregation query. Without a group by, exactly one row is returned.
Give foreign keys the same names as primary keys, where possible.
You may try:
SELECT L.* , C.Cnt
FROM Event_list L
LEFT JOIN (
SELECT E.EID, COUNT(*) AS Cnt
FROM Event_List E
JOIN Interest I
ON E.Interest = I.Interest_name
JOIN `User's Interests` U
ON U.Interest_id = I.Insert_Id
Where U.P_id=Pid and U.is_canceled=0
GROUP BY E.EID
) C
ON E.Eid = C.Eid
Order By Cnt DESC
I don't have the tables to test so you may want to correct column names and other conditions. Just provide you the idea.
I've been reading about this for the past day (even here), and have not found a suitable resource, so I'm popping it out there again :)
Check out these two queries:
SELECT DISTINCT transactions.StoreNumber FROM transactions WHERE PersonID=2 ORDER BY transactions.transactionID DESC;
and
SELECT GROUP_CONCAT(DISTINCT transactions.StoreNumber ORDER BY transactions.transactionID DESC SEPARATOR ',') FROM transactions WHERE PersonID=2 ORDER BY transactions.transactionID DESC;
From everything I've read I would expect the two queries to return the same results, with the second set grouped into CSV. They're not though.
Result set for query 1 (picture each value in its own row, formatting results here is cumbersome):
'611'
'345'
'340'
'310'
'327'
'323'
'362'
'360'
'330'
'379'
'356'
'367'
'375'
'306'
'354'
'389'
'343'
'346'
'357'
'733'
'370'
'347'
'703'
'355'
'341'
'342'
'358'
'351'
'319'
'365'
'372'
'368'
'353'
'363'
'349'
'369'
'336'
'364'
'202'
'366'
'416'
'731'
Result Set for query 2:
611,379,375,389,703,355,351,372,368,362,342,365,353,341,733,347,336,319,354,306,345,364,202,358,370,343,366,349,356,367,369,416,323,346,731,360,363,330,310,357,340,327
If I remove the DISTINCT clause, the results line up.
Can anyone point out what I'm doing wrong with the difference between the queries above?
The fact that removing DISTINCT from each query returns the same result indicates that DISTINCT is problematic within GROUP_CONCAT. Performing a GROUP BY outside the GROUP_CONCAT causes multiple rows to be returned, which isn't what I'm after.
Any ideas on how I can get a GROUP_CONCAT DISTINCT list of StoreNumber, in order of TransactionID DESC?
Thanks all
Consider your first query:
SELECT DISTINCT transactions.StoreNumber
FROM transactions
WHERE PersonID=2
ORDER BY transactions.transactionID DESC;
This is equivalent to:
SELECT transactions.StoreNumber
FROM transactions
WHERE PersonID=2
group by transactions.StoreNumber
ORDER BY transactions.transactionID DESC;
You are ordering by something that is not in the select list. So, MySQL chooses an arbitrary transactionid for each store number. This may differ from one execution to another.
I believe the same thing is happening in the group_concat(). The issue is that the arbitrary number chosen is different for each one.
If you want consistency, consider these two queries:
SELECT transactions.StoreNumber
FROM transactions
WHERE PersonID=2
group by transactions.StoreNumber
ORDER BY min(transactions.transactionID) DESC;
and:
SELECT GROUP_CONCAT(DISTINCT t.StoreNumber ORDER BY t.mintransactionID DESC SEPARATOR ',')
FROM (select t.StoreNumber, min(TransactionId) as minTransactionId
from transactions t
WHERE PersonID=2.transactionID
group by t.StoreNumber
) t
These should produce the same results.
Before you complain too loudly about MySQL, any other database would return an error on the first query, because, when using select distinct, you can only order by columns in the select list (or expressions composed of them).
I'm a MySQL query noobie so I'm sure this is a question with an obvious answer.
But, I was looking at these two queries. Will they return different result sets? I understand that the sorting process would commence differently, but I believe they will return the same results with the first query being slightly more efficient?
Query 1: HAVING, then AND
SELECT user_id
FROM forum_posts
GROUP BY user_id
HAVING COUNT(id) >= 100
AND user_id NOT IN (SELECT user_id FROM banned_users)
Query 2: WHERE, then HAVING
SELECT user_id
FROM forum_posts
WHERE user_id NOT IN(SELECT user_id FROM banned_users)
GROUP BY user_id
HAVING COUNT(id) >= 100
Actually the first query will be less efficient (HAVING applied after WHERE).
UPDATE
Some pseudo code to illustrate how your queries are executed ([very] simplified version).
First query:
1. SELECT user_id FROM forum_posts
2. SELECT user_id FROM banned_user
3. Group, count, etc.
4. Exclude records from the first result set if they are presented in the second
Second query
1. SELECT user_id FROM forum_posts
2. SELECT user_id FROM banned_user
3. Exclude records from the first result set if they are presented in the second
4. Group, count, etc.
The order of steps 1,2 is not important, mysql can choose whatever it thinks is better. The important difference is in steps 3,4. Having is applied after GROUP BY. Grouping is usually more expensive than joining (excluding records can be considering as join operation in this case), so the fewer records it has to group, the better performance.
You have already answers that the two queries will show same results and various opinions for which one is more efficient.
My opininion is that there will be a difference in efficiency (speed), only if the optimizer yields with different plans for the 2 queries. I think that for the latest MySQL versions the optimizers are smart enough to find the same plan for either query so there will be no difference at all but off course one can test and see either the excution plans with EXPLAIN or running the 2 queries against some test tables.
I would use the second version in any case, just to play safe.
Let me add that:
COUNT(*) is usually more efficient than COUNT(notNullableField) in MySQL. Until that is fixed in future MySQL versions, use COUNT(*) where applicable.
Therefore, you can also use:
SELECT user_id
FROM forum_posts
WHERE user_id NOT IN
( SELECT user_id FROM banned_users )
GROUP BY user_id
HAVING COUNT(*) >= 100
There are also other ways to achieve same (to NOT IN) sub-results before applying GROUP BY.
Using LEFT JOIN / NULL :
SELECT fp.user_id
FROM forum_posts AS fp
LEFT JOIN banned_users AS bu
ON bu.user_id = fp.user_id
WHERE bu.user_id IS NULL
GROUP BY fp.user_id
HAVING COUNT(*) >= 100
Using NOT EXISTS :
SELECT fp.user_id
FROM forum_posts AS fp
WHERE NOT EXISTS
( SELECT *
FROM banned_users AS bu
WHERE bu.user_id = fp.user_id
)
GROUP BY fp.user_id
HAVING COUNT(*) >= 100
Which of the 3 methods is faster depends on your table sizes and a lot of other factors, so best is to test with your data.
HAVING conditions are applied to the grouped by results, and since you group by user_id, all of their possible values will be present in the grouped result, so the placing of the user_id condition is not important.
To me, second query is more efficient because it lowers the number of records for GROUP BY and HAVING.
Alternatively, you may try the following query to avoid using IN:
SELECT `fp`.`user_id`
FROM `forum_posts` `fp`
LEFT JOIN `banned_users` `bu` ON `fp`.`user_id` = `bu`.`user_id`
WHERE `bu`.`user_id` IS NULL
GROUP BY `fp`.`user_id`
HAVING COUNT(`fp`.`id`) >= 100
Hope this helps.
No it does not gives same results.
Because first query will filter records from count(id) condition
Another query filter records and then apply having clause.
Second Query is correctly written