MySQL group/order behaves differently in 5.7

MySQL group/order behaves differently in 5.7 - mysql

I have a table that looks like this:
id | text | language_id | other_id | dateCreated
1 | something | 1 | 5 | 2015-01-02
2 | something | 1 | 5 | 2015-01-01
3 | something | 2 | 5 | 2015-01-01
4 | something | 2 | 6 | 2015-01-01
and I want to get all latest rows for each language_id that have other_id 5.
my query looks like this
SELECT * (
SELECT *
FROM tbl
WHERE other_id = 5
ORDER BY dateCreated DESC
) AS r
GROUP BY r.language_id
With MySQL 5.6 I get 2 rows with ID 1 and 3, which is what I want.
With MySQL 5.7.10 I get 2 rows with IDs 2 and 3 and it seems to me that the ORDER BY in the subquery is ignored.
Any ideas what might be the problem ?

You should go with the query below:
SELECT
*
FROM tbl
INNER JOIN
(
SELECT
other_id,
language_id,
MAX(dateCreated) max_date_created
FROM tbl
WHERE other_id = 5
GROUP BY language_id
) AS t
ON tbl.language_id = t.language_id AND tbl.other_id = t.other_id AND
tbl.dateCreated = t.max_date_created
Using GROUP BY without aggregate function will pick row in arbitrary order. You should not rely on what's row is returned by the GROUP BY. MySQL doesn't ensure this.
Quoting from this post
In a nutshell, MySQL allows omitting some columns from the GROUP BY,
for performance purposes, however this works only if the omitted
columns all have the same value (within a grouping), otherwise, the
value returned by the query are indeed indeterminate, as properly
guessed by others in this post. To be sure adding an ORDER BY clause
would not re-introduce any form of deterministic behavior.
Although not at the core of the issue, this example shows how using *
rather than an explicit enumeration of desired columns is often a bad
idea.
Excerpt from MySQL 5.0 documentation:
When using this feature, all rows in each group should have the same
values for the columns that are omitted from the GROUP BY part. The
server is free to return any value from the group, so the results are
indeterminate unless all values are the same.

Related

ORDER BY does not work if COUNT is used

I have a table with following content
loan_application
+----+---------+
| id | user_id |
+----+---------+
| 1 | 10 |
| 2 | 10 |
| 3 | 10 |
+----+---------+
I want to fetch 3rd record only if there are 3 records available, in this case i want id 3 and total count must be 3, here is what i expect
+--------------+----+
| COUNT(la.id) | id |
+--------------+----+
| 3 | 3 |
+--------------+----+
Here is the query i tried.
SELECT COUNT(la.id), la.id FROM loan_application la HAVING COUNT(la.id) = 3 ORDER BY la.id DESC;
However this gives me following result
+--------------+----+
| COUNT(la.id) | id |
+--------------+----+
| 3 | 1 |
+--------------+----+
The problem is that it returns id 1 even if i use order by id descending, whereas i am expecting the id to have value of 3, where am i going wrong ?
Thanks.

In your case u can use this query:
SELECT COUNT(la.id), max(la.id) FROM loan_application la
GROUP BY user_id
I try your table in my db MySQL

When you have a group by function (in this instance count()) in the select list without a group by clause, then mysql will return a single record only with the function applied to the whole table.
Mysql under certain configuration settings allow you to include fields in the select loist which are not in the group by clause, nor are aggregated. Mysql pretty much picks up the 1st value it encounters while scanning the data as a value for such fields, in your case the value 1 for id.
If you want to fetch the record where id=count of records within the table, then I would use the following query:
select *
from loan_application
join (select count(*) as numrows from loan_application) t
where id=t.numrows and t.numrows=3
However, this implies that the values within the id field are continuous and there are no gaps.

You are selecting la.id along with an aggregated function (COUNT). So after iterating the first record the la.id is selected but the count goes on. So in this case you will get the first la.id not the last. In order to get the last la.id you need to use the max function on that field.
Here's the updated query:
SELECT
COUNT(la.id),
MAX(la.id)
FROM
loan_application la
GROUP BY user_id
HAVING
COUNT(la.id) = 3
N:B: You are using COUNT without a GROUP BY Function. So this particular aggregated function is applied to the whole table.

Query1 Join with Query2 but Query2 use the result of Query1, How to do that with just 1 query?

Ok, I have a complicated query to get all article details, each article has many versions. I have a need to get details of article with latest version only (ie take the max version). Here is the table:
+------------+-----------+----------+
| ArticleID | Detail | Version |
+------------+-----------+----------+
| 1 | detail1 | 1 |
| 1 | detail2 | 1 |
| 1 | detail3 | 2 |
| 1 | detail4 | 2 |
| 3 | detail3 | 2 |
| 3 | detail6 | 2 |
| 3 | detail4 | 3 |
+------------+-----------+----------+
Now user just provides a detail & the query will take all details of all articles with version=max(version)
Suppose that if we don't care about max version, then a simple query could be
Select * from articleTb where Detail like '%3'
It will print out:
+------------+-----------+----------+
| ArticleID | Detail | Version |
+------------+-----------+----------+
| 1 | detail3 | 2 |
| 3 | detail3 | 2 |
+------------+-----------+----------+
But this doesn't meet the requirement cos the result should not have this record 3 - detail3 - 2 cos it doesn't contain the max version of articleID=3.
Let say user search for Detail like '%4', then a correct query should be:
ArticleID - Detail - Version
+----+-----------+----+
| 1 | detail4 | 2 |
+----+-----------+----+
| 3 | detail4 | 3 |
+----+-----------+----+
The 2 records appear cos they belongs to the article with max version. Explain, 2 is the maxversion of articleID=1 so it matches the condition, & 3 is the max version of articleID=3 so it also matches the condition.
So here is what i did,
select * from (Select * from articleTb where Detail like '%3') tb1
Join (select articleID, max(version) maxversion from articleTb where
Detail like '%3' group by articleID) tb2
on tb1.articleID=tb2.articleID and tb1.version=tb2.maxversion
However, for the above query the system have to duplicate the task where Detail like '%3' which is not good. Besides, my real world query1 is much more complicated than where Detail like '%3', then if i do like the above then the query will implement the same job TWICE? & that is very inefficient.
So how to deal this problem?

To improve performance, remove the unnecessary inline view, e.g.
SELECT tb1.*
FROM articleTb tb1
JOIN ( SELECT b.articleID
, MAX(b.version) AS maxversion
FROM articleTb b
WHERE b.Detail LIKE '%3'
GROUP BY b.articleID
) tb2
ON tb1.articleID = tb2.articleID
AND tb1.version = tb2.maxversion
WHERE tb1.Detail LIKE '%3'
and...
make sure you have appropriate indexes. A covering index with a leading column of article may enable MySQL to use the index to optimize the GROUP BY (avoiding a "Using filesort" operation.)
... ON articleTb (articleID, version, detail)
MySQL may also be able to use that index for the join to tb1; the derived table (inline view) won't have an index.
You can confirm the execution plan with an EXPLAIN.

I would use a CTE to create a table that contains the article id and the version id, then use that in my main query to filter down to the most recent version.
with latest as
(
select articleId, max(version) as version from articleTb
)
select ....
from articleTb a
inner join latestl on a.articleid = l.articleid and l.version = a.version

Use of aggregate table will helpful.
Let me describe a scenario first. Day 1, you get a flat file first time ever.
1. Load that in a staging table.
2. Find ArticleID, MAx (Version) for each Article ID, and store in the aggregate table.
3. Left outer join the stage table with the aggregate table joining on article ID. Pick the higher version. This will lead to your result.
4. Truncate the staging table.
Next day when a new feed arrives, the file will again be loaded into the truncated table, and left joined.
You can add a few audit fields in aggregate table such as date when that file arrived, maybe file name too. I had used this method in one of the projects in a insurance companies that resulted into several fold performance gain.

This is your query:
select *
from (Select * from articleTb where Detail like '%3'
) tb1 Join
(select articleID, max(version) maxversion
from articleTb
where Detail like '%3'
group by articleID
) tb2
on tb1.articleID=tb2.articleID and tb1.version=tb2.maxversion;
You are trying to get the last version of a particular type of article. Another approach is to use not exists:
select *
from articleTb t
where Detail like '%3' and
not exists (select 1
from articleTb t2
where t2.articleID = t1.articleID and
t2.Detail like '%3'
t2.version > t.version
);
This is saying: "Get me all the rows from articleTb where Detail ends in 3 and there isn't another version that is higher".
To improve performance, create an index on: articleTb(articleID, Detail, version). The one question is whether t2.Detail like '%3' is needed for the subquery -- does that condition filter articles or versions within an article? If it is not needed, then remove the index and change the condition to articleTb(articleID, version).

MySQL: Transfer Data Based on a Column Without Also Transferring That Column

My table stores revision data for my CMS entries. Each entry has an ID and a revision date, and there are multiple revisions:
Table: old_revisions
+----------+---------------+-----------------------------------------+
| entry_id | revision_date | entry_data |
+----------+---------------+-----------------------------------------+
| 1 | 1302150011 | I like pie. |
| 1 | 1302148411 | I like pie and cookies. |
| 1 | 1302149885 | I like pie and cookies and cake. |
| 2 | 1288917372 | Kittens are cute. |
| 2 | 1288918782 | Kittens are cute but puppies are cuter. |
| 3 | 1288056095 | Han shot first. |
+----------+---------------+-----------------------------------------+
I want to transfer some of this data to another table:
Table: new_revisions
+--------------+----------------+
| new_entry_id | new_entry_data |
+--------------+----------------+
| | |
+--------------+----------------+
I want to transfer entry_id and entry_data to new_entry_id and new_entry_data. But I only want to transfer the most recent version of each entry.
I got as far as this query:
INSERT INTO new_revisions (
new_entry_id,
new_entry_data
)
SELECT
entry_id,
entry_data,
MAX(revision_date)
FROM old_revisions
GROUP BY entry_id
But I think the problem is that I'm trying to insert 3 columns of data into 2 columns.
How do I transfer the data based on the revision date without transferring the revision date as well?

You can use the following query:
insert into new_revisions (new_entry_id, new_entry_data)
select o1.entry_id, o1.entry_data
from old_revisions o1
inner join
(
select max(revision_date) maxDate, entry_id
from old_revisions
group by entry_id
) o2
on o1.entry_id = o2.entry_id
and o1.revision_date = o2.maxDate
See SQL Fiddle with Demo. This query gets the max(revision_date) for each entry_id and then joins back to your table on both the entry_id and the max date to get the rows to be inserted.
Please note that the subquery is only returning the entry_id and date, this is because we want to apply the GROUP BY to the items in the select list that are not in an aggregate function. MySQL uses an extension to the GROUP BY clause that allows columns in the select list to be excluded in a group by and aggregate but this could causes unexpected results. By only including the columns needed by the aggregate and the group by will ensure that the result is the value you want. (see MySQL Extensions to GROUP BY)
From the MySQL Docs:
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. ... You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values the server chooses.

If you want to enter the last entry you need to filter it before:
select entry_id, max(revision_date) as maxDate
from old_revisions
group by entry_id;
Then use this as a subquery to filter the data you need:
insert into new_revisions (new_entry_id, new_entry_data)
select entry_id, entry_data
from old_revisions as o
inner join (
select entry_id, max(revision_date) as maxDate
from old_revisions
group by entry_id
) as a on o.entry_id = a.entry_id and o.revision_date = a.maxDate

MySQL - Exclude rows from Select based on duplication of two columns

I am attempting to narrow results of an existing complex query based on conditional matches on multiple columns within the returned data set. I'll attempt to simplify the data as much as possible here.
Assume that the following table structure represents the data that my existing complex query has already selected (here ordered by date):
+----+-----------+------+------------+
| id | remote_id | type | date |
+----+-----------+------+------------+
| 1 | 1 | A | 2011-01-01 |
| 3 | 1 | A | 2011-01-07 |
| 5 | 1 | B | 2011-01-07 |
| 4 | 1 | A | 2011-05-01 |
+----+-----------+------+------------+
I need to select from that data set based on the following criteria:
If the pairing of remote_id and type is unique to the set, return the row always
If the pairing of remote_id and type is not unique to the set, take the following action:
Of the sets of rows for which the pairing of remote_id and type are not unique, return only the single row for which date is greatest and still less than or equal to now.
So, if today is 2011-01-10, I'd like the data set returned to be:
+----+-----------+------+------------+
| id | remote_id | type | date |
+----+-----------+------+------------+
| 3 | 1 | A | 2011-01-07 |
| 5 | 1 | B | 2011-01-07 |
+----+-----------+------+------------+
For some reason I'm having no luck wrapping my head around this one. I suspect the answer lies in good application of group by, but I just can't grasp it. Any help is greatly appreciated!

/* Rows with exactly one date - always return regardless of when date occurs */
SELECT id, remote_id, type, date
FROM YourTable
GROUP BY remote_id, type
HAVING COUNT(*) = 1
UNION
/* Rows with more than one date - Return Max date <= NOW */
SELECT yt.id, yt.remote_id, yt.type, yt.date
FROM YourTable yt
INNER JOIN (SELECT remote_id, type, max(date) as maxdate
FROM YourTable
WHERE date <= DATE(NOW())
GROUP BY remote_id, type
HAVING COUNT(*) > 1) sq
ON yt.remote_id = sq.remote_id
AND yt.type = sq.type
AND yt.date = sq.maxdate

The group by clause groups all rows that have identical values of one or more columns together and returns one row in the result set for them. If you use aggregate functions (min, max, sum, avg etc.) that will be applied for each "group".
SELECT id, remote_id, type, max(date)
FROM blah
GROUP BY remote_id, date;
I'm not whore where today's date comes in, but assumed that was part of the complex query that you didn't describe and I assume isn't directly relevant to your question here.

Try this:
SELECT a.*
FROM table a INNER JOIN
(
select remote_id, type, MAX(date) date, COUNT(1) cnt from table
group by remote_id, type
) b
WHERE a.remote_id = b.remote_id,
AND a.type = b.type
AND a.date = b.date
AND ( (b.cnt = 1) OR (b.cnt>1 AND b.date <= DATE(NOW())))

Try this
select id, remote_id, type, MAX(date) from table
group by remote_id, type

Hey Carson! You could try using the "distinct" keyword on those two fields, and in a union you can use Count() along with group by and some operators to pull non-unique (greatest and less-than today) records!

Group by - Overriding default behaviour of deciding row under each group in result

Extending further from this question Query to find top rated article in each category -
Consider the same table -
id | category_id | rating
---+-------------+-------
1 | 1 | 10
2 | 1 | 8
3 | 2 | 7
4 | 3 | 5
5 | 3 | 2
6 | 3 | 6
There is a table articles, with fields id, rating (an integer from 1-10), and category_id (an integer representing to which category it belongs). And if I have the same goal to get the top rated articles in each query (this should be the result):-
Desired Result
id | category_id | rating
---+-------------+-------
1 | 1 | 10
3 | 2 | 7
6 | 3 | 6
Extension of original question
But, running the following query -
SELECT id, category_id, max( rating ) AS max_rating
FROM `articles`
GROUP BY category_id
results into the following where everything, except the id field, is as desired. I know how to do this with a subquery - as answered in the same question - Using subquery.
id category_id max_rating
1 1 10
3 2 7
4 3 6
In generic terms
Excluding the grouped column (category_id) and the evaluated columns (columns returning results of aggregate function like SUM(), MAX() etc. - in this case max_rating), the values returned in the other fields are simply the first row under every grouped result set (grouped by category_id in this case). E.g. the record with id =1 is the first one in the table under category_id 1 (id 1 and 2 under category_id 1) so it is returned.
I am just wondering is it not possible to somehow overcome this default behavior to return rows based on conditions? If mysql can perform calculation for every grouped result set (does MAX() counting etc) then why can't it return the row corresponding to the maximum rating. Is it not possible to do this in a single query without a subquery? This looks to me like a frequent requirement.
Update
I could not figure out what I want from Naktibalda's solution too. And just to mention again, I know how to do this using a subquery, as again answered by OMG Ponies.

Use:
SELECT x.id,
x.category_id,
x.rating
FROM YOUR_TABLE x
JOIN (SELECT t.category_id,
MAX(t.rating) AS max_rating
FROM YOUR_TABLE t
GROUP BY t.category_id) y ON y.category_id = x.category_id
AND y.max_rating = x.rating

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008