This question already has answers here:
Retrieving the last record in each group - MySQL
(33 answers)
Closed 2 years ago.
I have this simple table:
mysql> select deviceId, eventName, loggedAt from foo;
+----------+------------+---------------------+
| deviceId | eventName | loggedAt |
+----------+------------+---------------------+
| 1 | foo | 2020-09-18 21:27:21 |
| 1 | bar | 2020-09-18 21:27:26 |
| 1 | last event | 2020-09-18 21:27:43 | <--
| 2 | xyz | 2020-09-18 21:27:37 |
| 2 | last event | 2020-09-18 21:27:55 | <--
| 3 | last one | 2020-09-18 21:28:04 | <--
+----------+------------+---------------------+
and I want to select one row per deviceId with the most recent loggedAt. I've marked those rows with an arrow in the table above for clarity.
If I append group by id in the above query, I get the notorious:
Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'foo.eventName' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
and I don't want to change the sql_mode.
I've come pretty close to what I want using:
select deviceId, any_value(eventName), max(loggedAt) from foo group by deviceId;
but obviously the any_value returns a random result.
How can I solve this?
ONLY_FULL_GROUP_BY is a good thing: it enforces fundamental SQL standard rules, about which MySQL has been lax for a long time. Even if you were disabling it, you would get the same result as what you are getting with any_value().
You have a top-1-per group problem, where you cant the entire row that has the most recent date for each device. Aggregation is not the right tool for that, what you need is to filter the dataset.
One option uses a correlated subquery:
select f.*
from foo f
where f.loggedat = (
select max(f1.loggedate) from foo where f1.deviceid = f.deviceid
)
In MySQL 8.0, you can also use row_number():
select *
from (
select f.*, row_number() over(partition by deviceid order by loggedat desc) rn
from foo f
) f
where rn = 1
Related
I have a record table and its comment table, like:
| commentId | relatedRecordId | isRead |
|-----------+-----------------+--------|
| 1 | 1 | TRUE |
| 2 | 1 | FALSE |
| 3 | 1 | FALSE |
Now I want to select newCommentCount and allCommentCount as a server response to the browser. Is there any way to select these two fields in one SQL?
I've tried this:
SELECT `isRead`, count(*) AS cnt FROM comment WHERE relatedRecordId=1 GROUP BY `isRead`
| isRead | cnt |
| FALSE | 2 |
| TRUE | 1 |
But, I have to use a special data structure to map it and sum the cnt fields in two rows to get allCommentCount by using an upper-layer programming language. I want to know if I could get the following format of data by SQL only and in one step:
| newCommentCount | allCommentCount |
|-----------------+-----------------|
| 2 | 3 |
I don't even know how to describe the question. So I got no any search result in Google and Stackoverflow. (Because of My poor English, maybe)
Use conditional aggregation:
SELECT SUM(NOT isRead) AS newCommentCount, COUNT(*) AS allCommentCount
FROM comment
WHERE relatedRecordId = 1;
if I under stand you want show sum of newComments Count and all comments so you can do it like
SELECT SUM ( CASE WHEN isRead=false THEN 1 ELSE 0 END ) AS newComment,
Count(*) AS AllComments From comments where relatedRecord=1
also you can make store procedure for it.
To place two result sets horizontally, you can as simple as use a subquery for an expression in the SELECT CLAUSE as long as the number of rows from the result sets match:
select (select count(*) from c_table where isread=false and relatedRecordId=1 ) as newCommentCount,
count(*) as allCommentCount
from c_table where relatedRecordId=1;
According to documentations of MySQL I read SELECT executes before group by. I have a table named Views as follows and query
select distinct(viewer_id) as id
from Views v
group by viewer_id,view_date
having count(distinct(article_id))>1;
In this query if select is performed before group by according to documentation,how is it able to group by based on view_date as only viewer_id is selected. This has really confused me about how exact order of group by and select work.
+------------+-----------+-----------+------------+
| article_id | author_id | viewer_id | view_date |
+------------+-----------+-----------+------------+
| 1 | 3 | 5 | 2019-08-01 |
| 3 | 4 | 5 | 2019-08-01 |
| 1 | 3 | 6 | 2019-08-02 |
| 2 | 7 | 7 | 2019-08-01 |
| 2 | 7 | 6 | 2019-08-02 |
| 4 | 7 | 1 | 2019-07-22 |
| 3 | 4 | 4 | 2019-07-21 |
| 3 | 4 | 4 | 2019-07-21 |
+------------+-----------+-----------+------------+
There is no order to the evaluation. A SQL query describes the result set.
It is true that MySQL has a rather naive optimizer, so you can often see what the resulting query will be. But you should not think of the clauses as being evaluated in a particular order.
You might be confusing evaluation of the query with scoping rules. These affect how a particular identifier is determined.
You should not think in term of order of execution, but rather in terms of correctness of the statement. The select clause must be consistent with the group by clause: that is, any column that is present in the select clause and this is not part of an aggregate function must belong to the group by clause.
It is, on the other hand, perfectly valid to have columns in the group by clause that does not belong to the select clause - although the results might be a bit difficult to understand, because some information is missing about how the groups were built.
If we remove the distinct in the select clause, your query would phrase as:
select viewer_id as id
from views v
group by viewer_id, view_date
having count(distinct(article_id)) > 1;
This brings the viewer_ids for every view_date when they have more than one distinct article_id. A given viewer_id may appear more than once in the resultset, if they satisfied the condition on more than one date.
Then, distinct filters out duplicates viewer_ids: as a result, you get the list of viewers that viewed more one article on any date.
I'm writing a cronjob that runs analysis on a flags table in my database, structured as such:
| id | item | def | time_flagged | time_resolved | status |
+----+------+-----+--------------+---------------+---------+
| 1 | 1 | foo | 1519338608 | 1519620669 | MISSED |
| 2 | 1 | bar | 1519338608 | (NULL) | OPEN |
| 3 | 2 | bar | 1519338608 | 1519620669 | IGNORED |
| 4 | 1 | foo | 1519620700 | (NULL) | OPEN |
For each distinct def, for each unique price, I want to get the "latest" row (IFNULL(`time_resolved`, `time_flagged`) AS `time`). If no such row exists for a given def-item combination, that's okay; I just don't want any duplicates for a given def-item combination.
For the above data set, I would like to select:
| def | item | time | status |
+-----+------+------------+---------+
| foo | 1 | 1519620700 | OPEN |
| bar | 1 | 1519338608 | OPEN |
| bar | 2 | 1519620669 | IGNORED |
Row 1 is not included because it's "overridden" by row 4, as both rows have the same def-item combination, and the latter has a more recent time.
The data set will have a few dozen distinct defs, a few hundred distinct items, and a very large number of flags that will only increase over time.
How can I go about doing this? I see the greatest-n-per-group tag is rife with similar questions but I don't see any that involve my specific circumstance of needed "nested grouping" across two columns.
You could try:
select distinct def, item, IFNULL(time_resolved, time_flagged) AS time, status from flags A where IFNULL(time_resolved, time_flagged) = (select MAX(IFNULL(time_resolved, time_flagged)) from flags B where A.item = B.item and A.def = B.def )
I know it's not the best approach but it might work for you
Do you mean 'for each unique Def and each unique Item'? If so, a group by of multiple columns seems like it would work (shown as a temp table t) joined back to the original table to grab the rest of the data:
select
table.def,
table.item,
table.time,
status
from
table
join (select
def,
item,
max(time) time
from table
group by def, item) t
on
table.def=t.def and
table.item=t.item and
table.time=t.time
Depending on your version of mySQL, you can use a window function:
SELECT def, item, time, status
FROM (
SELECT
def,
item,
time,
status,
RANK() OVER(PARTITION BY def, item ORDER BY COALESCE(time_resolved, time_flagged) DESC) MyRank -- Rank each (def, item) combination by "time"
FROM MyTable
) src
WHERE MyRank = 1 -- Only return top-ranked (i.e. most recent) rows per (def, item) grouping
If you can have a (def, item) combo with the same "time" value, then change RANK() to ROW_NUMBER. This will guarantee you only get one row per grouping.
select table.def, table.item, a.time, table.status
from table
join (select
def, item, MAX(COALESCE(time_r, time_f)) as time
from temp
group by def, item) a
on temp.def = a.def and
temp.item = a.item and
COALESCE(temp.time_r, temp.time_f) = a.time
I have the following table:
+------+-------+--------------------------------------+
| id | rev | content |
+------+-------+--------------------------------------+
| 1 | 1 | ... |
| 2 | 1 | ... |
| 1 | 2 | ... |
| 1 | 3 | ... |
+------+-------+--------------------------------------+
When I run the following query:
SELECT id, MAX(rev) maxrev, content
FROM YourTable
GROUP BY id;
I get:
+------+----------+--------------------------------------+
| id | maxrev | content |
+------+----------+--------------------------------------+
| 1 | 3 | ... |
| 2 | 1 | ... |
+------+----------+--------------------------------------+
But if I remove the GROUP BY clause as follows:
SELECT id, MAX(rev) maxrev, content
FROM YourTable;
I get:
+------+----------+--------------------------------------+
| id | maxrev | content |
+------+----------+--------------------------------------+
| 1 | 3 | ... |
+------+----------+--------------------------------------+
This is counter-intuitive to me because of the expectation that a GROUP BY would reduce the number of results by eliminating duplicate values. However, in the above case, introduction of the GROUP BY does the opposite. Is this because of the MAX() function, and if so, how?
PS: The table is based on the SO question here: SQL select only rows with max value on a column. I was trying to understand the answer to that question, and in the process, came across the above situation.
EDIT:
I got the above results on sqlfiddle.com using its MySQL 5.6 engine, with no customization/configuration.
It is utilizing your MAX() function dependent on your GROUP BY clause. So, for your first query, you are saying: Give me the maximum rev for each id, whereas the second is just saying Give me the maximum rev in general.
Thanks to xQbert:
This does NOT mean that you are getting the row with the max rev in the latter case. It will take values from anywhere in the selection to use for your id and content fields.
You can read more about how SQL handles the GROUP BY statement here: Documentation
This because you are using a version previuos that mysql 5.7 ..these version allow the use of aggregated d function and select column not in group by ... this produce impredicatble result for the not aggregated column .. in mysql 5.7 this beahvior is not allowed ... you have an error if you in select not aggregated function not mentioned in group by
the correct sintax is obviuosly the first
SELECT id, MAX(rev) maxrev, content
FROM YourTable
GROUP BY id;
SELECT id, MAX(rev) maxrev, content FROM YourTable
GROUP BY id;
When you run this, as there are 2 distinct ids in the table you get two rows in the result, one per id with the max value. The grouping happens on the id column.
SELECT id, MAX(rev) maxrev, content
FROM YourTable;
If you remove the group by clause, you only get one row in the result corresponding to the max value in the entire table. There is no grouping by id.
I'm breaking my head over how to do this one in SQL. I have a table:
| User_id | Question_ID | Answer_ID |
| 1 | 1 | 1 |
| 1 | 2 | 10 |
| 2 | 1 | 2 |
| 2 | 2 | 11 |
| 3 | 1 | 1 |
| 3 | 2 | 10 |
| 4 | 1 | 1 |
| 4 | 2 | 10 |
It holds user answers to a particular question. A question might have multiple answers. A User cannot answer the same question twice. (Hence, there's only one Answer_ID per {User_id, Question_ID})
I'm trying to find an answer to this query: For a particular question and answer id (Related to the same question), I want to find the most common answer given to OTHER question by users with the given answer.
For example, For the above table:
For question_id = 1 -> For Answer_ID = 1 - (Question 2 - Answer ID 10)
For Answer_ID = 2 - (Question 2 - Answer ID 11)
Is it possible to do in one query? Should it be done in one query? Shall I just use stored procedure or Java for that one?
Though #rick-james is right, I am not sure that it is easy to start when you do not not how the queries like this are usually written for MySQL.
You need a query to find out the most common answers to questions:
SELECT
question_id,
answer_id,
COUNT(*) as cnt
FROM user_answers
GROUP BY 1, 2
ORDER BY 1, 3 DESC
This would return a table where for each question_id we output counts in descending order.
| 1 | 1 | 3 |
| 1 | 2 | 1 |
| 2 | 10 | 3 |
| 2 | 11 | 1 |
And now we should solve a so called greatest-n-per-group task. The problem is that in MySQL for the sake of performance the tasks like this are usually solved not in pure SQL, but using hacks which rest on knowledge how the queries are processed internally.
In this case we know that we can define a variable and then iterating over the ready table, have knowledge about the previous row, which allows us to distinguish between the first row in a group and the others.
SELECT
question_id, answer_id, cnt,
IF(question_id=#q_id, NULL, #q_id:=question_id) as v
FROM (
SELECT
question_id, answer_id, COUNT(*) as cnt
FROM user_answers
GROUP BY 1, 2
ORDER BY 1, 3 DESC) cnts
JOIN (
SELECT #q_id:=-1
) as init;
Make sure that you have initialised the variable (and respect its data type on initialisation, otherwise it may be unexpectedly casted later). Here is the result:
| 1 | 1 | 3 | 1 |
| 1 | 2 | 1 |(null)|
| 2 | 10 | 3 | 2 |
| 2 | 11 | 1 |(null)|
Now we just need to filter out rows with NULL in the last column. Since the column is actually not needed we can move the same expression into the WHERE clause. The cnt column is actually not needed either, so we can skip it as well:
SELECT
question_id, answer_id
FROM (
SELECT
question_id, answer_id
FROM user_answers
GROUP BY 1, 2
ORDER BY 1, COUNT(*) DESC) cnts
JOIN (
SELECT #q_id:=-1
) as init
WHERE IF(question_id=#q_id, NULL, #q_id:=question_id) IS NOT NULL;
The last thing worth mentioning, for the query to be efficient you should have correct indexes. This query requires an index starting with (question_id, answer_id) columns. Since you anyway need a UNIQUE index, it make sense to define it in this order: (question_id, answer_id, user_id).
CREATE TABLE user_answers (
user_id INTEGER,
question_id INTEGER,
answer_id INTEGER,
UNIQUE INDEX (question_id, answer_id, user_id)
) engine=InnoDB;
Here is an sqlfiddle to play with: http://sqlfiddle.com/#!9/bd12ad/20.
Do you want a fish? Or do you want to learn how to fish?
Your question seems to have multiple steps.
Fetch info about "questions by users with the given answer". Devise this SELECT and imagine that the results form a new table.
Apply the "OTHER" restriction. This is probably a minor AND ... != ... added to SELECT #1.
Now find the "most common answer". This probably involves ORDER BY COUNT(*) DESC LIMIT 1. It is likely to
use a derived table:
SELECT ...
FROM ( select#2 )
Your question is multi conditional, you have to get first Questions with their asking user from Question table:
select question_id,user_id from question
Then insert the answer to the asked question and make some checks in your Java code like (is user has answered to this same question as the user who is asking this question, is user answered this question for multiple times).
select question_id,user_id from question where user_id=asking-user_id // gets all questions and show on UI
select answer_id,user_id from answer where user_id=answering-user_id // checks the answers that particular user