MySQL: Transfer Data Based on a Column Without Also Transferring That Column - mysql

My table stores revision data for my CMS entries. Each entry has an ID and a revision date, and there are multiple revisions:
Table: old_revisions
+----------+---------------+-----------------------------------------+
| entry_id | revision_date | entry_data |
+----------+---------------+-----------------------------------------+
| 1 | 1302150011 | I like pie. |
| 1 | 1302148411 | I like pie and cookies. |
| 1 | 1302149885 | I like pie and cookies and cake. |
| 2 | 1288917372 | Kittens are cute. |
| 2 | 1288918782 | Kittens are cute but puppies are cuter. |
| 3 | 1288056095 | Han shot first. |
+----------+---------------+-----------------------------------------+
I want to transfer some of this data to another table:
Table: new_revisions
+--------------+----------------+
| new_entry_id | new_entry_data |
+--------------+----------------+
| | |
+--------------+----------------+
I want to transfer entry_id and entry_data to new_entry_id and new_entry_data. But I only want to transfer the most recent version of each entry.
I got as far as this query:
INSERT INTO new_revisions (
new_entry_id,
new_entry_data
)
SELECT
entry_id,
entry_data,
MAX(revision_date)
FROM old_revisions
GROUP BY entry_id
But I think the problem is that I'm trying to insert 3 columns of data into 2 columns.
How do I transfer the data based on the revision date without transferring the revision date as well?

You can use the following query:
insert into new_revisions (new_entry_id, new_entry_data)
select o1.entry_id, o1.entry_data
from old_revisions o1
inner join
(
select max(revision_date) maxDate, entry_id
from old_revisions
group by entry_id
) o2
on o1.entry_id = o2.entry_id
and o1.revision_date = o2.maxDate
See SQL Fiddle with Demo. This query gets the max(revision_date) for each entry_id and then joins back to your table on both the entry_id and the max date to get the rows to be inserted.
Please note that the subquery is only returning the entry_id and date, this is because we want to apply the GROUP BY to the items in the select list that are not in an aggregate function. MySQL uses an extension to the GROUP BY clause that allows columns in the select list to be excluded in a group by and aggregate but this could causes unexpected results. By only including the columns needed by the aggregate and the group by will ensure that the result is the value you want. (see MySQL Extensions to GROUP BY)
From the MySQL Docs:
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. ... You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values the server chooses.

If you want to enter the last entry you need to filter it before:
select entry_id, max(revision_date) as maxDate
from old_revisions
group by entry_id;
Then use this as a subquery to filter the data you need:
insert into new_revisions (new_entry_id, new_entry_data)
select entry_id, entry_data
from old_revisions as o
inner join (
select entry_id, max(revision_date) as maxDate
from old_revisions
group by entry_id
) as a on o.entry_id = a.entry_id and o.revision_date = a.maxDate

Related

Fetching unique records from duplicate data without using GROUP BY

I have some problem with GROUP BY. For some reason I cannot use it. So I am looking for some alternative solution. I am using online webhosting and with GROUP BY I am getting this error:
#1055 - Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'denieuwe_db.guest.gid' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
Bit Intro: I am making a restaurant website and in the Guest table I have all the guest_code which is unique. Inside the purchase table there are all the purchases of the guest with purchaseid & guest_code
Table: Purchase
+------------------------+
|purchaseid | guest_code |
+------------------------+
| 1 | CUST7408668|
| 2 | CUST7408668|
| 3 | CUST4425874|
| 4 | CUST4425874|
| 5 | CUST4425874|
+------------------------+
Table: guest
+-----------------------------------+
| gid | guest_code | Name |
+-----------------------------------+
| 1 | CUST7408668| Mia |
| 5 | CUST4425874| zoi |
+------------------------+----------+
SQL:
SELECT purchase.purchaseid,
purchase.guest_code,
guest.guest_code,
guest.name,
FROM purchase
INNER JOIN guest ON purchase.guest_code=guest.guest_code GROUP BY purchase.guest_code;
What I want in result:
+-----------------------------------+--------------------+
|purchaseid | guest_code | name | gid | guest_code |
+-----------------------------------+--------------------+
| 1 | CUST7408668| Mia | 1 | CUST7408668|
| 3 | CUST4425874| zoi | 5 | CUST4425874|
+------------------------+----------+--------------------+
I am getting exact result using GROUP BY on localhost but getting a problem on my hosting with it. So I am looking for alternative solution. Any help is appreciated
It seems like you want the minimum purchase ID. You can use min() for that. You might also need to extend the GROUP BY clause with the other columns.
SELECT min(purchase.purchaseid) purchaseid,
purchase.guest_code,
guest.guest_code,
guest.name
FROM purchase
INNER JOIN guest
ON purchase.guest_code = guest.guest_code
GROUP BY purchase.guest_code,
guest.guest_code,
guest.name;
You can use Distinct on purchase table guest code to remove duplicate data, since you are not doing any aggregation
SELECT DISTINCT( purchase.guest_code ),
purchase.purchaseid,
guest.guest_code,
guest.NAME,
FROM purchase
INNER JOIN guest
ON purchase.guest_code = guest.guest_code
may be one simplest way, using only guest_code in the group by because its related.
SELECT MIN(purchase.purchaseid) AS purchaseid,
purchase.guest_code,
guest.Name,
guest.gid,
guest.guest_code
FROM purchase
INNER JOIN guest
ON purchase.guest_code=guest.guest_code
GROUP BY purchase.guest_code
ORDER BY 1;
Aggregate before joining:
SELECT p.purchaseid, p.guest_code, g.name,
FROM (SELECT MIN(p.purchaseid) as purchaseid, p.guest_code
FROM purchase p
GROUP BY p.guest_code
) p JOIN
guest g
ON p.guest_code = g.guest_code;
QUESTION UPDATE
I contacted the Webhost technical team and this is what they say about this issue.
Hi,
on shared hosting this is not allowed as you change the database
server behavior for ALL other websites.
But you can tweak your SQL queries to get a similar behavior by using
ANY_VALUE() for each nonaggregated column.
eg.
SELECT name, address, MAX(age) FROM t GROUP BY name;
fails, because address is not in the GROUP BY.
using:
SELECT name, ANY_VALUE(address), MAX(age) FROM t GROUP BY name;
works on the other side and produces similar result to disabling
ONLY_FULL_GROUP_BY
Explanation from MySQL:
If ONLY_FULL_GROUP_BY is disabled, a MySQL extension to the standard
SQL use of GROUP BY permits the select list, HAVING condition, or
ORDER BY list to refer to nonaggregated columns even if the columns
are not functionally dependent on GROUP BY columns. This causes MySQL
to accept the preceding query. In this case, the server is free to
choose any value from each group, so unless they are the same, the
values chosen are nondeterministic, which is probably not what you
want. Furthermore, the selection of values from each group cannot be
influenced by adding an ORDER BY clause. Result set sorting occurs
after values have been chosen, and ORDER BY does not affect which
value within each group the server chooses. Disabling
ONLY_FULL_GROUP_BY is useful primarily when you know that, due to some
property of the data, all values in each nonaggregated column not
named in the GROUP BY are the same for each group.
You can achieve the same effect without disabling ONLY_FULL_GROUP_BY
by using ANY_VALUE() to refer to the nonaggregated column.

MySQL COUNT(DISTINCT) giving wrong values with GROUP BY

I have a table that contains custom user analytics data. I was able to pull the number of unique users with a query:
SELECT COUNT(DISTINCT(user_id)) AS 'unique_users'
FROM `events`
WHERE client_id = 123
And this will return 16728
This table also has a column of type DATETIME that I would like to group the counts by. However, if I add a GROUP BY to the end of it, everything groups properly it seems except the totals don't match. My new query is this:
SELECT COUNT(DISTINCT(user_id)) AS 'unique_users', DATE(server_stamp) AS 'date'
FROM `events`
WHERE client_id = 123
GROUP BY DATE(server_stamp)
Now I get the following values:
|-----------------------------|
| unique_users | date |
|---------------|-------------|
| 2650 | 2019-08-26 |
| 3486 | 2019-08-27 |
| 3475 | 2019-08-28 |
| 3631 | 2019-08-29 |
| 3492 | 2019-08-30 |
|-----------------------------|
Totaling to 16734. I tried using a sub query to get the distinct users then count and group in the main query but no luck there. Any help in this would be greatly appreciated. Let me know if there is further information to help diagnosis.
A user, who is connected with events on multiple days (e.g. session starts before midnight and ends afterwards), will occur the number of these days times in the new query. This is due to the fact, that the first query performs the DISTINCT over all rows at once while the second just removes duplicates inside each groups. Identical values in different groups will stay untouched.
So if you have a combination of DISTINCT in the select clause and a GROUP BY, the GROUP BY will be executed before the DISTINCT. Thus without any restrictions you cannot assume, that the COUNT(DISTINCT user_id) of the first query and the sum over the COUNT(DISTINCT user_id) of all groups is the same.
Xandor is absolutely correct. If a user logged on 2 different days, There is no way your 2nd query can remove them. If you need data grouped by date, You can try below query -
SELECT COUNT(user_id) AS 'unique_users', DATE(MIN_DATE) AS 'date'
FROM (SELECT user_id, MIN(DATE(server_stamp)) MIN_DATE -- Might be MAX
FROM `events`'
WHERE client_id = 123
GROUP BY user_id) X
GROUP BY DATE(server_stamp);

MySQL group/order behaves differently in 5.7

I have a table that looks like this:
id | text | language_id | other_id | dateCreated
1 | something | 1 | 5 | 2015-01-02
2 | something | 1 | 5 | 2015-01-01
3 | something | 2 | 5 | 2015-01-01
4 | something | 2 | 6 | 2015-01-01
and I want to get all latest rows for each language_id that have other_id 5.
my query looks like this
SELECT * (
SELECT *
FROM tbl
WHERE other_id = 5
ORDER BY dateCreated DESC
) AS r
GROUP BY r.language_id
With MySQL 5.6 I get 2 rows with ID 1 and 3, which is what I want.
With MySQL 5.7.10 I get 2 rows with IDs 2 and 3 and it seems to me that the ORDER BY in the subquery is ignored.
Any ideas what might be the problem ?
You should go with the query below:
SELECT
*
FROM tbl
INNER JOIN
(
SELECT
other_id,
language_id,
MAX(dateCreated) max_date_created
FROM tbl
WHERE other_id = 5
GROUP BY language_id
) AS t
ON tbl.language_id = t.language_id AND tbl.other_id = t.other_id AND
tbl.dateCreated = t.max_date_created
Using GROUP BY without aggregate function will pick row in arbitrary order. You should not rely on what's row is returned by the GROUP BY. MySQL doesn't ensure this.
Quoting from this post
In a nutshell, MySQL allows omitting some columns from the GROUP BY,
for performance purposes, however this works only if the omitted
columns all have the same value (within a grouping), otherwise, the
value returned by the query are indeed indeterminate, as properly
guessed by others in this post. To be sure adding an ORDER BY clause
would not re-introduce any form of deterministic behavior.
Although not at the core of the issue, this example shows how using *
rather than an explicit enumeration of desired columns is often a bad
idea.
Excerpt from MySQL 5.0 documentation:
When using this feature, all rows in each group should have the same
values for the columns that are omitted from the GROUP BY part. The
server is free to return any value from the group, so the results are
indeterminate unless all values are the same.

select min value of a field from joins table

CREATE VIEW products_view
AS
Hi guys ! I've tree tables:
Products
Categories
Prices
A product belongs to one category and may has more prices.
consider this set of data:
Product :
id title featured category_id
1 | bread | yes | 99
2 | milk | yes | 99
3 | honey | yes | 99
Price :
id product_id price quantity
1 | 1 | 99.99 | 10
2 | 1 | 150.00 | 50
3 | 2 | 33.10 | 20
4 | 2 | 10.00 | 11
I need to create a view, a full list of products that for each product select the min price and its own category.
eg.
id title featured cat.name price quantity
1 | bread | yes | food | 99.99 | 10
I tried the following query but in this way I select only the min Price.price value but Price.quantity, for example, came from another row. I should find the min Price.price value and so use the Price.quantity of this row as correct data.
CREATE VIEW products_view
AS
SELECT `Prod`.`id`, `Prod`.`title`, `Prod`.`featured`, `Cat`.`name`, MIN(`Price`.`price`) as price,`Price`.`quantity`
FROM `products` AS `Prod`
LEFT JOIN `prices` AS `Price` ON (`Price`.`product_id` = `Prod`.`id`)
LEFT JOIN `categories` AS `Cat` ON (`Prod`.`category_id` = `Cat`.`id`)
GROUP BY `Prod`.`id`
ORDER BY `Prod`.`id` ASC
My result is:
id title featured cat.name price quantity
1 | bread | yes | food | 99.99 | **50** <-- wrong
Can you help me ? Thx in advance !
As documented under MySQL Extensions to GROUP BY (emphasis added):
In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause. For example, this query is illegal in standard SQL because the name column in the select list does not appear in the GROUP BY:
SELECT o.custid, c.name, MAX(o.payment)
FROM orders AS o, customers AS c
WHERE o.custid = c.custid
GROUP BY o.custid;
For the query to be legal, the name column must be omitted from the select list or named in the GROUP BY clause.
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values within each group the server chooses.
What you are looking for is the group-wise minimum, which can be obtained by joining the grouped results back to the table:
SELECT Prod.id, Prod.title, Prod.featured, Cat.name, Price.price, Price.quantity
FROM products AS Prod
LEFT JOIN categories AS Cat ON Prod.category_id = Cat.id
LEFT JOIN (
prices AS Price NATURAL JOIN (
SELECT product_id, MIN(price) AS price
FROM prices
GROUP BY product_id
) t
) ON Price.product_id = Prod.id
ORDER BY Prod.id

MySQL - Exclude rows from Select based on duplication of two columns

I am attempting to narrow results of an existing complex query based on conditional matches on multiple columns within the returned data set. I'll attempt to simplify the data as much as possible here.
Assume that the following table structure represents the data that my existing complex query has already selected (here ordered by date):
+----+-----------+------+------------+
| id | remote_id | type | date |
+----+-----------+------+------------+
| 1 | 1 | A | 2011-01-01 |
| 3 | 1 | A | 2011-01-07 |
| 5 | 1 | B | 2011-01-07 |
| 4 | 1 | A | 2011-05-01 |
+----+-----------+------+------------+
I need to select from that data set based on the following criteria:
If the pairing of remote_id and type is unique to the set, return the row always
If the pairing of remote_id and type is not unique to the set, take the following action:
Of the sets of rows for which the pairing of remote_id and type are not unique, return only the single row for which date is greatest and still less than or equal to now.
So, if today is 2011-01-10, I'd like the data set returned to be:
+----+-----------+------+------------+
| id | remote_id | type | date |
+----+-----------+------+------------+
| 3 | 1 | A | 2011-01-07 |
| 5 | 1 | B | 2011-01-07 |
+----+-----------+------+------------+
For some reason I'm having no luck wrapping my head around this one. I suspect the answer lies in good application of group by, but I just can't grasp it. Any help is greatly appreciated!
/* Rows with exactly one date - always return regardless of when date occurs */
SELECT id, remote_id, type, date
FROM YourTable
GROUP BY remote_id, type
HAVING COUNT(*) = 1
UNION
/* Rows with more than one date - Return Max date <= NOW */
SELECT yt.id, yt.remote_id, yt.type, yt.date
FROM YourTable yt
INNER JOIN (SELECT remote_id, type, max(date) as maxdate
FROM YourTable
WHERE date <= DATE(NOW())
GROUP BY remote_id, type
HAVING COUNT(*) > 1) sq
ON yt.remote_id = sq.remote_id
AND yt.type = sq.type
AND yt.date = sq.maxdate
The group by clause groups all rows that have identical values of one or more columns together and returns one row in the result set for them. If you use aggregate functions (min, max, sum, avg etc.) that will be applied for each "group".
SELECT id, remote_id, type, max(date)
FROM blah
GROUP BY remote_id, date;
I'm not whore where today's date comes in, but assumed that was part of the complex query that you didn't describe and I assume isn't directly relevant to your question here.
Try this:
SELECT a.*
FROM table a INNER JOIN
(
select remote_id, type, MAX(date) date, COUNT(1) cnt from table
group by remote_id, type
) b
WHERE a.remote_id = b.remote_id,
AND a.type = b.type
AND a.date = b.date
AND ( (b.cnt = 1) OR (b.cnt>1 AND b.date <= DATE(NOW())))
Try this
select id, remote_id, type, MAX(date) from table
group by remote_id, type
Hey Carson! You could try using the "distinct" keyword on those two fields, and in a union you can use Count() along with group by and some operators to pull non-unique (greatest and less-than today) records!