MySQL - Exclude rows from Select based on duplication of two columns - mysql

I am attempting to narrow results of an existing complex query based on conditional matches on multiple columns within the returned data set. I'll attempt to simplify the data as much as possible here.
Assume that the following table structure represents the data that my existing complex query has already selected (here ordered by date):
+----+-----------+------+------------+
| id | remote_id | type | date |
+----+-----------+------+------------+
| 1 | 1 | A | 2011-01-01 |
| 3 | 1 | A | 2011-01-07 |
| 5 | 1 | B | 2011-01-07 |
| 4 | 1 | A | 2011-05-01 |
+----+-----------+------+------------+
I need to select from that data set based on the following criteria:
If the pairing of remote_id and type is unique to the set, return the row always
If the pairing of remote_id and type is not unique to the set, take the following action:
Of the sets of rows for which the pairing of remote_id and type are not unique, return only the single row for which date is greatest and still less than or equal to now.
So, if today is 2011-01-10, I'd like the data set returned to be:
+----+-----------+------+------------+
| id | remote_id | type | date |
+----+-----------+------+------------+
| 3 | 1 | A | 2011-01-07 |
| 5 | 1 | B | 2011-01-07 |
+----+-----------+------+------------+
For some reason I'm having no luck wrapping my head around this one. I suspect the answer lies in good application of group by, but I just can't grasp it. Any help is greatly appreciated!

/* Rows with exactly one date - always return regardless of when date occurs */
SELECT id, remote_id, type, date
FROM YourTable
GROUP BY remote_id, type
HAVING COUNT(*) = 1
UNION
/* Rows with more than one date - Return Max date <= NOW */
SELECT yt.id, yt.remote_id, yt.type, yt.date
FROM YourTable yt
INNER JOIN (SELECT remote_id, type, max(date) as maxdate
FROM YourTable
WHERE date <= DATE(NOW())
GROUP BY remote_id, type
HAVING COUNT(*) > 1) sq
ON yt.remote_id = sq.remote_id
AND yt.type = sq.type
AND yt.date = sq.maxdate

The group by clause groups all rows that have identical values of one or more columns together and returns one row in the result set for them. If you use aggregate functions (min, max, sum, avg etc.) that will be applied for each "group".
SELECT id, remote_id, type, max(date)
FROM blah
GROUP BY remote_id, date;
I'm not whore where today's date comes in, but assumed that was part of the complex query that you didn't describe and I assume isn't directly relevant to your question here.

Try this:
SELECT a.*
FROM table a INNER JOIN
(
select remote_id, type, MAX(date) date, COUNT(1) cnt from table
group by remote_id, type
) b
WHERE a.remote_id = b.remote_id,
AND a.type = b.type
AND a.date = b.date
AND ( (b.cnt = 1) OR (b.cnt>1 AND b.date <= DATE(NOW())))

Try this
select id, remote_id, type, MAX(date) from table
group by remote_id, type

Hey Carson! You could try using the "distinct" keyword on those two fields, and in a union you can use Count() along with group by and some operators to pull non-unique (greatest and less-than today) records!

Related

MySQL select - If a column value is redundant, only show the newest by timestamp

I have a table like this:
timesent |nr | value
2018-10-31 05:23:06 | 4 | Value 3
2018-10-31 05:20:19 | 4 | Value 2
2018-10-31 05:19:35 | 4 | Value 1
2018-10-31 04:55:56 | 3 | Value 2
2018-10-31 03:05:15 | 3 | Value 1
2018-10-31 01:31:49 | 2 | Value 1
2018-10-30 04:11:16 | 1 | Value 1
At the moment, my select looks like this:
SELECT * FROM values WHERE ORDER BY timesent DESC
I want to do an sql-select statement which gives me back only the most recent value of each "nr".
My skills are not good enough to translate that into a sql-statement. I donĀ“t even know what I should google for.
Values is a Reserved Keyword in MySQL. Consider changing your table name to something else; otherwise you will have to use backticks around it
There are various ways to achieve the result for your problem. One way is to do a "Self-Left-Join" on nr (field on which you want to get the maximum timesent value row only).
SELECT v1.*
FROM `values` AS v1
LEFT JOIN `values` AS v2
ON v1.nr = v2.nr AND
v1.timesent < v2.timesent
WHERE v2.nr IS NULL
For MySQL version >= 8.0.2, you can use Window Functions. We will determine Row_Number() for each row over a partition of nr, with timesent in Descending order (Highest timesent value will have row number = 1). Then, use this result-set in a Derived Table and consider only those rows, where row number is equal to 1.
SELECT dt.timesent,
dt.nr,
dt.value
FROM
(
SELECT v.timesent, v.nr, v.value,
ROW_NUMBER() OVER (PARTITION BY v.nr
ORDER BY v.timesent DESC) AS row_num
FROM `values` AS v
) AS dt
WHERE dt.row_num = 1
Yet, another approach is to get the maximum value of timesent for a nr group in a Derived Table. Now join this result-set to the main table, so that only the rows corresponding to max value appear:
SELECT v.timesent,
v.nr,
v.value
FROM
`values` AS v
JOIN
(
SELECT nr, MAX(timesent) AS max_timesent
FROM `values`
GROUP BY nr
) AS dt ON dt.nr = v.nr AND
dt.max_timesent = v.timesent

MySQL - select rows under an ID, group by column value that has the latest timestamp

Table:
----------------------------------------------------
ID | field_name | field_value | timestamp
----------------------------------------------------
2 | postcode | LS1 | 2016-11-09 16:45:15
2 | age | 34 | 2016-11-09 16:45:22
2 | job | Scientist | 2016-11-09 16:45:27
2 | age | 38 | 2016-11-09 16:46:40
7 | postcode | LS5 | 2016-11-09 16:47:05
7 | age | 24 | 2016-11-09 16:47:44
I wonder if anyone could give me a few pointers, based on the above data, I would like to query by ID 2, return a row for each unique field_name (if more than one row exists under the same id with the same field_name then just return the row with the latest timestamp).
I have managed to almost achieve this by grouping the field_name, which will return a list of unique rows but not necessarily the latest row.
SELECT * FROM fragment WHERE (id = :id) GROUP BY field_name
I would really be grateful for any pointers on what exactly I should do here, and how I could fit something along the lines of MAX(timestamp) in this query,
Many thanks!
Consider you first need a set of data for each ID, FieldName with the max time stamp. (generate that set) as an inline view (B below). Then, join this set (B) back to your base set allowing the inner join to eliminate the unwanted rows.
SELECT A.ID, A.field_name, A.field_value, A.timestamp
FROM Table A
INNER JOIN (SELECT ID, field_name, MAX(timestamp) TS
FROM table
GROUP BY ID, field_name) B
on A.ID = B.ID
and A.field_name = B.field_name
and A.timestamp = B.TS
Outside of MySQL this could be done using window/analytical functions as you would be able to assign a row number to each record and eliminate those > 1 something like....
SELECT B.*
FROM (SELECT A.ID
, A.field_name
, A.field_Vale
, A.timestamp
, Rownumber() over (Order by A.timestamp Desc) RN
FROM Table A ) B
WHERE B.RN = 1
or using a cross apply with a limit or top.
The Simpliest way to do:
SELECT *
FROM fragment fra1
WHERE (id = :id)
and timestamp = (select max(timestamp)
from fragment fra2
where fra2.id = fra1.id
and fra2.field_name = fra1.field_name)
GROUP BY field_name

How to iterate over possible values over values of a column

I have a table, which stores names and numbers associated with it. Each of these numbers are of a certain type. The combination of name and type is unique.
name | type | number
---------------------
a | t0 | 5
a | t1 | 6
b | t0 | 7
c | t1 | 8
And I want this table as a result, which shows the numbers which are associated with that name, regardless of type accumulated.
name | sum
------------
a | 11
c | 8
b | 7
Now I know the SUM() function and I have written this so far:
SELECT name, SUM( ??SELECT name, SUM(number) WHERE name = 'a'?? ) AS sum
FROM table
ORDER BY sum DESC;
The thing is I don't know how to write the inner part of the SUM() function, as I can't use SELECT statements inside of functions. Also I don't know how I can iterate over the possible values of the name column.
SELECT NAME,
Sum(number) As 'Sum'
FROM table
GROUP BY NAME
ORDER BY Sum(number) DESC;
You need to use GROUP BY along with SUM:
SELECT name, SUM(number) AS sum
FROM tableName
GROUP BY name
ORDER BY SUM(number) DESC; -- or ORDER BY sum DESC; - it is identical

MySQL: Transfer Data Based on a Column Without Also Transferring That Column

My table stores revision data for my CMS entries. Each entry has an ID and a revision date, and there are multiple revisions:
Table: old_revisions
+----------+---------------+-----------------------------------------+
| entry_id | revision_date | entry_data |
+----------+---------------+-----------------------------------------+
| 1 | 1302150011 | I like pie. |
| 1 | 1302148411 | I like pie and cookies. |
| 1 | 1302149885 | I like pie and cookies and cake. |
| 2 | 1288917372 | Kittens are cute. |
| 2 | 1288918782 | Kittens are cute but puppies are cuter. |
| 3 | 1288056095 | Han shot first. |
+----------+---------------+-----------------------------------------+
I want to transfer some of this data to another table:
Table: new_revisions
+--------------+----------------+
| new_entry_id | new_entry_data |
+--------------+----------------+
| | |
+--------------+----------------+
I want to transfer entry_id and entry_data to new_entry_id and new_entry_data. But I only want to transfer the most recent version of each entry.
I got as far as this query:
INSERT INTO new_revisions (
new_entry_id,
new_entry_data
)
SELECT
entry_id,
entry_data,
MAX(revision_date)
FROM old_revisions
GROUP BY entry_id
But I think the problem is that I'm trying to insert 3 columns of data into 2 columns.
How do I transfer the data based on the revision date without transferring the revision date as well?
You can use the following query:
insert into new_revisions (new_entry_id, new_entry_data)
select o1.entry_id, o1.entry_data
from old_revisions o1
inner join
(
select max(revision_date) maxDate, entry_id
from old_revisions
group by entry_id
) o2
on o1.entry_id = o2.entry_id
and o1.revision_date = o2.maxDate
See SQL Fiddle with Demo. This query gets the max(revision_date) for each entry_id and then joins back to your table on both the entry_id and the max date to get the rows to be inserted.
Please note that the subquery is only returning the entry_id and date, this is because we want to apply the GROUP BY to the items in the select list that are not in an aggregate function. MySQL uses an extension to the GROUP BY clause that allows columns in the select list to be excluded in a group by and aggregate but this could causes unexpected results. By only including the columns needed by the aggregate and the group by will ensure that the result is the value you want. (see MySQL Extensions to GROUP BY)
From the MySQL Docs:
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. ... You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values the server chooses.
If you want to enter the last entry you need to filter it before:
select entry_id, max(revision_date) as maxDate
from old_revisions
group by entry_id;
Then use this as a subquery to filter the data you need:
insert into new_revisions (new_entry_id, new_entry_data)
select entry_id, entry_data
from old_revisions as o
inner join (
select entry_id, max(revision_date) as maxDate
from old_revisions
group by entry_id
) as a on o.entry_id = a.entry_id and o.revision_date = a.maxDate

Using ORDER BY and GROUP BY together

My table looks like this (and I'm using MySQL):
m_id | v_id | timestamp
------------------------
6 | 1 | 1333635317
34 | 1 | 1333635323
34 | 1 | 1333635336
6 | 1 | 1333635343
6 | 1 | 1333635349
My target is to take each m_id one time, and order by the highest timestamp.
The result should be:
m_id | v_id | timestamp
------------------------
6 | 1 | 1333635349
34 | 1 | 1333635336
And i wrote this query:
SELECT * FROM table GROUP BY m_id ORDER BY timestamp DESC
But, the results are:
m_id | v_id | timestamp
------------------------
34 | 1 | 1333635323
6 | 1 | 1333635317
I think it causes because it first does GROUP_BY and then ORDER the results.
Any ideas? Thank you.
One way to do this that correctly uses group by:
select l.*
from table l
inner join (
select
m_id, max(timestamp) as latest
from table
group by m_id
) r
on l.timestamp = r.latest and l.m_id = r.m_id
order by timestamp desc
How this works:
selects the latest timestamp for each distinct m_id in the subquery
only selects rows from table that match a row from the subquery (this operation -- where a join is performed, but no columns are selected from the second table, it's just used as a filter -- is known as a "semijoin" in case you were curious)
orders the rows
If you really don't care about which timestamp you'll get and your v_id is always the same for a given m_i you can do the following:
select m_id, v_id, max(timestamp) from table
group by m_id, v_id
order by max(timestamp) desc
Now, if the v_id changes for a given m_id then you should do the following
select t1.* from table t1
left join table t2 on t1.m_id = t2.m_id and t1.timestamp < t2.timestamp
where t2.timestamp is null
order by t1.timestamp desc
Here is the simplest solution
select m_id,v_id,max(timestamp) from table group by m_id;
Group by m_id but get max of timestamp for each m_id.
You can try this
SELECT tbl.* FROM (SELECT * FROM table ORDER BY timestamp DESC) as tbl
GROUP BY tbl.m_id
SQL>
SELECT interview.qtrcode QTR, interview.companyname "Company Name", interview.division Division
FROM interview
JOIN jobsdev.employer
ON (interview.companyname = employer.companyname AND employer.zipcode like '100%')
GROUP BY interview.qtrcode, interview.companyname, interview.division
ORDER BY interview.qtrcode;
I felt confused when I tried to understand the question and answers at first. I spent some time reading and I would like to make a summary.
The OP's example is a little bit misleading.
At first I didn't understand why the accepted answer is the accepted answer.. I thought that the OP's request could be simply fulfilled with
select m_id, v_id, max(timestamp) as max_time from table
group by m_id, v_id
order by max_time desc
Then I took a second look at the accepted answer. And I found that actually the OP wants to express that, for a sample table like:
m_id | v_id | timestamp
------------------------
6 | 1 | 11
34 | 2 | 12
34 | 3 | 13
6 | 4 | 14
6 | 5 | 15
he wants to select all columns based only on (group by)m_id and (order by)timestamp.
Then the above sql won't work. If you still don't get it, imagine you have more columns than m_id | v_id | timestamp, e.g m_id | v_id | timestamp| columnA | columnB |column C| .... With group by, you can only select those "group by" columns and aggreate functions in the result.
By far, you should have understood the accepted answer.
What's more, check row_number function introduced in MySQL 8.0:
https://www.mysqltutorial.org/mysql-window-functions/mysql-row_number-function/
Finding top N rows of every group
It does the simlar thing as the accepted answer.
Some answers are wrong. My MySQL gives me error.
select m_id,v_id,max(timestamp) from table group by m_id;
#abinash sahoo
SELECT m_id,v_id,MAX(TIMESTAMP) AS TIME
FROM table_name
GROUP BY m_id
#Vikas Garhwal
Error message:
[42000][1055] Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'testdb.test_table.v_id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
Why make it so complicated? This worked.
SELECT m_id,v_id,MAX(TIMESTAMP) AS TIME
FROM table_name
GROUP BY m_id
Just you need to desc with asc. Write the query like below. It will return the values in ascending order.
SELECT * FROM table GROUP BY m_id ORDER BY m_id asc;