SQL How to group by two columns - mysql

Bellow is an example table.
ID FROM TO DATE
1 Number1 Number2 somedate
2 Number2 Number1 somedate
3 Number2 Number1 somedate
4 Number3 Number1 somedate
5 Number3 Number2 somedate
Expected result is to get 1 row for each unique pair of TO and FROM columns
Example result if ordered by ID ASC
(1,Number1,Number2)
(4,Number3,Number1)
(5,Number3,Number2)
Ok I have found how to do this with the following query
SELECT * FROM table GROUP BY LEAST(to,from), GREATEST(to,from)
However I am not able to get the most recent record for every unique pair.
I have tried with order by ID desc but it returns the first found row for unique pair.

SQL fiddle isn't working for some reason so in the mean time you will need to help me to help you.
Assuming that the following statement works
SELECT
LEAST(to,from) as LowVal,
GREATEST(to,from) as HighVal,
MAX(date) as MaxDate
FROM table
GROUP BY LEAST(to,from), GREATEST(to,from)
then you could join to that as
select t.*
from
table t
inner join
(SELECT
LEAST(to,from) as LowVal,
GREATEST(to,from) as HighVal,
MAX(date) as MaxDate
FROM table
GROUP BY LEAST(to,from), GREATEST(to,from)
) v
on t.date = v.MaxDate
and (t.From = v.LowVal or t.From = v.HighVal)
and (t.To = v.LowVal or t.To= v.HighVal)

I believe the following would work, my knowledge is with Microsoft SQL Server, not MySQL. If MySQL lacks one of these, let me know and I'll delete the answer.
DECLARE #Table1 TABLE(
ID int,
Too varchar(10),
Fromm varchar(10),
Compared int)
INSERT INTO #Table1 values (1, 'John','Mary', 2), (2,'John', 'Mary', 1), (3,'Sue','Charles',1), (4,'Mary','John',3)
SELECT ID, Too, Fromm, Compared
FROM #Table1 as t
INNER JOIN
(
SELECT
CASE WHEN Too < Fromm THEN Too+Fromm
ELSE Fromm+Too
END as orderedValues, MIN(compared) as minComp
FROM #Table1
GROUP BY CASE WHEN Too < Fromm THEN Too+Fromm
ELSE Fromm+Too
END
) ordered ON
ordered.minComp = t.Compared
AND ordered.orderedValues =
CASE
WHEN Too < Fromm
THEN Too+Fromm
ELSE
Fromm+Too
END
I used an int instead of time value, but it would work the same. It's dirty, but it's giving me the results I expected.
The basics of it, is to use a derived query where you take the two columns you want to get unique values for and use a case statement to combine them into a standard format. In this case, earlier alphabetical concatenated with the later value alphabetically. Use that value to get the minimum value we are looking for, join back to the original table to get the values separated out again plus whatever else is in that table. It is assuming the value we are aggregating is going to be unique, so in this case if there was (1, 'John', 'Mary', 2) and (2, 'Mary', 'John', 2), it would kind of break and return 2 records for that couple.

This answer was originally inspired by Get records with max value for each group of grouped SQL results
but then I looked further and came up with the correct solution.
CREATE TABLE T
(`id` int, `from` varchar(7), `to` varchar(7), `somedate` datetime)
;
INSERT INTO T
(`id`, `from`, `to`, `somedate`)
VALUES
(1, 'Number1', 'Number2', '2015-01-01 00:00:00'),
(2, 'Number2', 'Number1', '2015-01-02 00:00:00'),
(3, 'Number2', 'Number1', '2015-01-03 00:00:00'),
(4, 'Number3', 'Number1', '2015-01-04 00:00:00'),
(5, 'Number3', 'Number2', '2015-01-05 00:00:00');
Tested on MySQL 5.6.19
SELECT *
FROM
(
SELECT *
FROM T
ORDER BY LEAST(`to`,`from`), GREATEST(`to`,`from`), somedate DESC
) X
GROUP BY LEAST(`to`,`from`), GREATEST(`to`,`from`)
Result set
id from to somedate
3 Number2 Number1 2015-01-03
4 Number3 Number1 2015-01-04
5 Number3 Number2 2015-01-05
But, this relies on some shady behavior of MySQL, which will be changed in future versions. MySQL 5.7 rejects this query because the columns in the SELECT clause are not functionally dependent on the GROUP BY columns. If it is configured to accept it (ONLY_FULL_GROUP_BY is disabled), it works like the previous versions, but still it is not guaranteed: "The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate."
So, the correct answer seems to be this:
SELECT T.*
FROM
T
INNER JOIN
(
SELECT
LEAST(`to`,`from`) AS LowVal,
GREATEST(`to`,`from`) AS HighVal,
MAX(somedate) AS MaxDate
FROM T
GROUP BY LEAST(`to`,`from`), GREATEST(`to`,`from`)
) v
ON T.somedate = v.MaxDate
AND (T.From = v.LowVal OR T.From = v.HighVal)
AND (T.To = v.LowVal OR T.To = v.HighVal)
Result set is the same as above, but in this case it is guaranteed to stay like this, while before you could easily get different date and id for row Number2, Number1, depending on what indexes you have on the table.
It will work as expected until you have two rows in the original data that have exactly the same somedate and to and from.
Let's add another row:
INSERT INTO T (`id`, `from`, `to`, `somedate`)
VALUES (6, 'Number1', 'Number2', '2015-01-03 00:00:00');
The query above would return two rows for 2015-01-03:
id from to somedate
3 Number2 Number1 2015-01-03
6 Number1 Number2 2015-01-03
4 Number3 Number1 2015-01-04
5 Number3 Number2 2015-01-05
To fix this we need a method to choose only one row in the group. In this example we can use unique ID to break the tie. If there are more than one rows in the group with the same maximum date we will choose the row with the largest ID.
The inner-most sub-query called Groups simply returns all groups, like original query in the question. Then we add one column id to this result set, and we choose id that belongs to the same group and has highest somedate and then highest id, which is done by ORDER BY and LIMIT. This sub-query is called GroupsWithIDs. Once we have all groups and an id of the correct row for each group we join this to the original table to fetch the rest of the column for found ids.
final query
SELECT T.*
FROM
(
SELECT
Groups.N1
,Groups.N2
,
(
SELECT T.id
FROM T
WHERE
LEAST(`to`,`from`) = Groups.N1 AND
GREATEST(`to`,`from`) = Groups.N2
ORDER BY T.somedate DESC, T.id DESC
LIMIT 1
) AS id
FROM
(
SELECT LEAST(`to`,`from`) AS N1, GREATEST(`to`,`from`) AS N2
FROM T
GROUP BY LEAST(`to`,`from`), GREATEST(`to`,`from`)
) AS Groups
) AS GroupsWithIDs
INNER JOIN T ON T.id = GroupsWithIDs.id
final result set
id from to somedate
4 Number3 Number1 2015-01-04
5 Number3 Number2 2015-01-05
6 Number1 Number2 2015-01-03

Related

Inner-Join on temporary columns using MAX()

I have the table t in MariaDB (latest), which includes among others the columns person_ID, date_1, date_2. They contain person IDs and string dates respectively. For each ID there is only one date_1, but multiple date_2. Rows either have a date_1 or date_2 that is why I am joining on ID. Here is an example of the table t:
person_ID
date_1
date_2
A
-
3
A
-
5
A
1
-
B
-
10
B
-
14
B
5
-
C
-
11
C
-
9
C
7
-
Create and fill table t:
CREATE TABLE t(
id SERIAL,
person_ID TEXT,
date_1 TEXT,
date_2 TEXT,
PRIMARY KEY (id)
);
INSERT INTO t (person_ID, date_2) VALUES ('A', 3);
INSERT INTO t (person_ID, date_2) VALUES ('A', 5);
INSERT INTO t (person_ID, date_1) VALUES ('A', 1);
INSERT INTO t (person_ID, date_2) VALUES ('B', 10);
INSERT INTO t (person_ID, date_2) VALUES ('B', 14);
INSERT INTO t (person_ID, date_1) VALUES ('B', 5);
INSERT INTO t (person_ID, date_2) VALUES ('C', 11);
INSERT INTO t (person_ID, date_2) VALUES ('C', 9);
INSERT INTO t (person_ID, date_1) VALUES ('C', 7);
SET GLOBAL sql_mode=(SELECT REPLACE(##sql_mode,'ONLY_FULL_GROUP_BY',''));
The following is an inner-join of two subqueries A and B. Query A gives a distinct list of person_IDs, which contain a date_1, and date_1 itself. On the other hand query B should give a distinct list of person_IDs that contain a date_2, and MAX(date_2).
SELECT A.person_ID, A.date_A, B.date_B, B.date_B - A.date_A AS diff FROM
(SELECT person_ID, date_1 AS date_A FROM t
WHERE date_1 >= 0) A
INNER JOIN
(SELECT person_ID, MAX(date_2) AS date_B FROM t
WHERE date_2 >= 0
GROUP BY person_ID) B
ON A.person_ID = B.person_ID
AND B.date_B > A.date_A
AND (B.date_B - A.date_A) <= 7
GROUP BY A.person_ID;
That gives the output:
person ID
date_A
date_B
diff
A
1
5
4
C
7
9
2
But this would be the desired outcome (ignoring ID = B, because diff = 9):
person ID
date_A
date_B
diff
A
1
5
4
C
7
11
4
I assume MAX(date_2) gives 9 instead of 11 for person_ID = C, because that value was inserted last for date_2.
You can use this link to try it out yourself.
This problem is made harder by your sparse table (rows with NULLs). Here's how I would approach this.
Start with a subquery to clean up the sparse table. It generates a result set where the rows with nulls are removed, generating a result like this.
person_ID date_1 date_2 diff
A 1 3 2
A 1 5 4
B 5 10 5
B 5 14 9
C 7 11 4
C 7 9 2
This puts the single date_1 value for each person into the rows with the date_2 values. The query to do that is:
SELECT t.person_ID, b.date_1, t.date_2, t.date_2 - b.date_1 diff
FROM t
JOIN t b ON t.person_ID = b.person_ID
AND b.date_1 IS NOT NULL
AND t.date_2 IS NOT NULL
Let's name the output of that subquery with the alias detail.
Your business logic calls for the very common greatest-n-per-group query pattern. It calls for retrieving the row with the largest diff for each person_ID, as long as diff <= 7. With that detail subquery we can write your logic a little more easily. In your result set you want the row for each person_ID that shows date_1, date_2, and diff for the largest diff, but leaving out any rows with a diff > 7.
First, write another subquery that finds the largest qualifying diff value for each person_ID.
SELECT person_ID, MAX(diff) diff
FROM detail
GROUP BY person_ID
HAVING MAX(diff) <= 7
Then join that subquery to the detail to get your desired result set.
SELECT detail.*
FROM detail
JOIN ( SELECT person_ID, MAX(diff) diff
FROM detail
GROUP BY person_ID
HAVING MAX(diff) <= 7
) md ON detail.person_ID = md.person_ID
AND detail.diff = md.diff
Now, I used a common table expression to write this query: to define the detail. That syntax is available in MariaDB 10.2+ (and MySQL 8+). Putting it together, here is the query.
WITH detail AS
(SELECT t.person_ID, b.date_1, t.date_2, t.date_2 - b.date_1 diff
FROM t
JOIN t b ON t.person_ID = b.person_ID
AND b.date_1 IS NOT NULL
AND t.date_2 IS NOT NULL
)
SELECT detail.*
FROM detail
JOIN ( SELECT person_ID, MAX(diff) diff
FROM detail
GROUP BY person_ID
HAVING MAX(diff) <= 7
) md ON detail.person_ID = md.person_ID
AND detail.diff = md.diff
Summary: the steps to solving your problem.
Deal with the sparse-data problem in your input table ... get rid of the input rows with NULL values by filling in date_1 values in the rows that have date_2 values. And, throw in the diffs.
Find the largest eligible diff values for each person_ID.
Join that list of largest diffs back into the detail table to extract the correct row of the detail table.
Pro tip Don't turn off ONLY_FULL_GROUP_BY. You don't want to rely on MySQL / MariaDB's strange nonstandard extension to GROUP BY, because it sometimes yields the wrong values. When it does that it's baffling.

Comparing n with (n-1) and (n-2) records in SQL

Write a SQL statement which can generate the list of customers whose minutes Streamed is consistently less than the previous minutes Streamed. As in minutes Streamed in the nth order is less than minutes Streamed in n-1th order, and the next previous order is also less. Another way to say it, list the customers that watch less and less minutes each time they watch a movie.
The table, query:
sqlfiddle link:
I have come up with the following query:
select distinct c1.customer_Id
from Customer c1
join Customer c2
where c1.customer_Id = c2.customer_Id
and c1.purchaseDate > c2.purchaseDate
and c1.minutesStreamed < c2.minutesStreamed;
This query doesn't deal with the (n-1)st and (n-2)nd comparison, i.e. "minutes Streamed in the nth order is less than minutes Streamed in n-1th order, and the next previous order is also less." condition.
I have attached a link for sqlfiddle, where I have created the table.
Hello Continuous Learner,
the following statement works for the n-1 and n-2 relation.
select distinct c1.customer_Id
from Customer c1
join Customer c2
on c1.customer_Id = c2.customer_Id
join Customer c3
on c1.customer_Id = c3.customer_Id
where c1.purchaseDate < c2.purchaseDate
and c1.minutesStreamed > c2.minutesStreamed
and c2.purchaseDate < c3.purchaseDate
and c2.minutesStreamed > c3.minutesStreamed
Although, I currently don't have an automatic solution for this problem.
Cheers
I would use a ROW_NUMBER() function with partition by customer id.
and then do a self join, on customer id and rank = rank-1, to bring new and old at the same level
Like:
create temp_rank_table as
(
select
customer_Id,
purchaseDate ,
minutesStreamed,
ROW_NUMBER() OVER (PARTITION BY customer_Id, ORDER BY purchaseDate, minutesStreamed) as cust_row
from Customer
)
self join
select customer_Id
( select
newval.customer_Id,
sum(case when newval.minutesStreamed < oldval.minutesStreamed then 1 else 0 end) as LessThanPrevCount,
max(newval.cust_row) as totalStreamCount
from temp_rank_table newval
left join temp_rank_table oldval
on newval.customer_id = oldval.customer_id
and newval.cust_row-1 = oldval.cust_row -- cust_row 2 matches to cust_row 1
group by newval.customer_id
)A
where A.LessThanPrevCount = (A.totalStreamCount-1)
-- get customers who always stream lesser than previous
--you can use having clause instead of a subquery too
DECLARE #TBL AS TABLE ( [NO] INT, [CODE] VARCHAR(50), [AREA]
VARCHAR(50) )
/* EXAMPLE 1 */ INSERT INTO #TBL([NO],[CODE],[AREA]) VALUES
(1,'001','A00') INSERT INTO #TBL([NO],[CODE],[AREA]) VALUES
(2,'001','A00') INSERT INTO #TBL([NO],[CODE],[AREA]) VALUES
(3,'001','B00') INSERT INTO #TBL([NO],[CODE],[AREA]) VALUES
(4,'001','C00') INSERT INTO #TBL([NO],[CODE],[AREA]) VALUES
(5,'001','C00') INSERT INTO #TBL([NO],[CODE],[AREA]) VALUES
(6,'001','A00') INSERT INTO #TBL([NO],[CODE],[AREA]) VALUES
(7,'001','A00')
/* EXAMPLE 2 / / ***** USE THIS CODE TO ENTER DATA FROM DIRECT TABLE
***** SELECT ROW_NUMBER() OVER(ORDER BY [FIELD_DATE]) AS [NO] ,[FIELD_CODE] AS [CODE] ,[FIELD_AREA] AS [AREA] FROM TABLE_A WHERE
CAST([FIELD_DATE] AS DATE) >= CAST('20200307' AS DATE) ORDER BY
[FIELD_DATE],[FIELD_CODE]
*/
SELECT A.NO AS ANO ,A.CODE AS ACODE ,A.AREA AS AAREA ,B.NO AS BNO
,B.CODE AS BCODE ,B.AREA AS BAREA ,CASE WHEN A.AREA=B.AREA THEN
'EQUAL' ELSE 'NOT EQUAL' END AS [COMPARE AREA] FROM #TBL A LEFT JOIN
#TBL B ON A.NO=B.NO+1
Blockquote

How to add records to query

I've got query.
SELECT * FROM '.PRFX.'sell
WHERE draft = "0" '.$e_sql.'
AND ID NOT IN (SELECT id_ FROM '.PRFX.'skipped WHERE uid = "'.$u.'")
AND ID NOT IN (SELECT id_ FROM '.PRFX.'followed WHERE uid = "'.$u.'")
ORDER BY raised DESC '.$sql_limit;
I want to add 3 records by the lowest number of refreshes; best on 5th position
they must be unique (so if you connect two UNION ALL...)
Firstly, you need make your SQL more readable. Something like this
SELECT * FROM sell
WHERE draft = 0
AND ID NOT IN (SELECT id_ FROM skipped WHERE uid = '0')
AND ID NOT IN (SELECT id_ FROM followed WHERE uid = '0')
ORDER BY raised DESC LIMIT 15
Then, what do you want? Add data to sell table through single request? This can be done with such request
INSERT INTO sell (key1, key2, keyN)
VALUES
('aaa', 'bbb', 'ccc'),
('ddd', 'eee', 'fff');
-- and so forth.

Looking for missed IDs in SQL Server 2008

I have a table that contains two columns
ID | Name
----------------
1 | John
2 | Sam
3 | Peter
6 | Mike
It has missed IDs. In this case these are 4 and 5.
How do I find and insert them together with random names into this table?
Update: cursors and temp tables are not allowed. The random name should be 'Name_'+ some random number. Maybe it would be the specified value like 'Abby'. So it doesn't matter.
Using a recursive CTE you can determine the missing IDs as follows
DECLARE #Table TABLE(
ID INT,
Name VARCHAR(10)
)
INSERT INTO #Table VALUES (1, 'John'),(2, 'Sam'),(3,'Peter'),(6, 'Mike')
DECLARE #StartID INT,
#EndID INT
SELECT #StartID = MIN(ID),
#EndID = MAX(ID)
FROM #Table
;WITH IDS AS (
SELECT #StartID IDEntry
UNION ALL
SELECT IDEntry + 1
FROM IDS
WHERE IDEntry + 1 <= #EndID
)
SELECT IDS.IDEntry [ID]
FROM IDS LEFT JOIN
#Table t ON IDS.IDEntry = t.ID
WHERE t.ID IS NULL
OPTION (MAXRECURSION 0)
The option MAXRECURSION 0 will allow the code to avoid the recursion limit of SQL SERVER
From Query Hints and WITH common_table_expression (Transact-SQL)
MAXRECURSION number Specifies the maximum number of recursions
allowed for this query. number is a nonnegative integer between 0 and
32767. When 0 is specified, no limit is applied. If this option is not specified, the default limit for the server is 100.
When the specified or default number for MAXRECURSION limit is reached
during query execution, the query is ended and an error is returned.
Because of this error, all effects of the statement are rolled back.
If the statement is a SELECT statement, partial results or no results
may be returned. Any partial results returned may not include all rows
on recursion levels beyond the specified maximum recursion level.
Generating the RANDOM names will largly be affected by the requirements of such a name, and the column type of such a name. What exactly does this random name entail?
You can do this using a recursive Common Table Expression CTE. Here's an example how:
DECLARE #MaxId INT
SELECT #MaxId = MAX(ID) from MyTable
;WITH Numbers(Number) AS
(
SELECT 1
UNION ALL
SELECT Number + 1 FROM Numbers WHERE Number < #MaxId
)
SELECT n.Number, 'Random Name'
FROM Numbers n
LEFT OUTER JOIN MyTable t ON n.Number=t.ID
WHERE t.ID IS NULL
Here are a couple of articles about CTEs that will be helpful to Using Common Table Expressions and Recursive Queries Using Common Table Expressions
Start by selecting the highest number in the table (select top 1 id desc), or select max(id), then run a while loop to iterate from 1...max.
See this article about looping.
For each iteration, see if the row exists, and if not, insert into table, with that ID.
I think recursive CTE is a better solution, because it's going to be faster, but here is what worked for me:
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[TestTable]') AND type in (N'U'))
DROP TABLE [dbo].[TestTable]
GO
CREATE TABLE [dbo].[TestTable](
[Id] [int] NOT NULL,
[Name] [varchar](50) NOT NULL,
CONSTRAINT [PK_TestTable] PRIMARY KEY CLUSTERED
(
[Id] ASC
))
GO
INSERT INTO [dbo].[TestTable]([Id],[Name]) VALUES (1, 'John')
INSERT INTO [dbo].[TestTable]([Id],[Name]) VALUES (2, 'Sam')
INSERT INTO [dbo].[TestTable]([Id],[Name]) VALUES (3, 'Peter')
INSERT INTO [dbo].[TestTable]([Id],[Name]) VALUES (6, 'Mike')
GO
declare #mod int
select #mod = MAX(number)+1 from master..spt_values where [type] = 'P'
INSERT INTO [dbo].[TestTable]
SELECT y.Id,'Name_' + cast(newid() as varchar(45)) Name from
(
SELECT TOP (select MAX(Id) from [dbo].[TestTable]) x.Id from
(
SELECT
t1.number*#mod + t2.number Id
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
WHERE t1.[type] = 'P' and t2.[type] = 'P'
) x
WHERE x.Id > 0
ORDER BY x.Id
) y
LEFT JOIN [dbo].[TestTable] on [TestTable].Id = y.Id
where [TestTable].Id IS NULL
GO
select * from [dbo].[TestTable]
order by Id
GO
http://www.sqlfiddle.com/#!3/46c7b/18
It's actually very simple :
Create a table called #All_numbers which should contain all the natural number in the range that you are looking for.
#list is a table containing your data
select a.num as missing_number ,
'Random_Name' + convert(varchar, a.num)
from #All_numbers a left outer join #list l on a.num = l.Id
where l.id is null

Delete rows without leading zeros

I have a table with a column (registration_no varchar(9)). Here is a sample:
id registration no
1 42400065
2 483877668
3 019000702
4 837478848
5 464657588
6 19000702
7 042400065
Please take note of registration numbers like (042400065) and (42400065), they are almost the same, the difference is just the leading zero.
I want to select all registration numbers that have the same case as above and delete the ones without a leading zero i.e (42400065)
pls, also note that before i delete the ones without leading zeros (42400065), i need to be sure that there is an equivalent with leading zeros(042400065)
declare #T table
(
id int,
[registration no] varchar(9)
)
insert into #T values
(1, '42400065'),
(2, '483877668'),
(3, '019000702'),
(4, '837478848'),
(5, '464657588'),
(6, '19000702'),
(7, '042400065')
;with C as
(
select row_number() over(partition by cast([registration no] as int)
order by [registration no]) as rn
from #T
)
delete from C
where rn > 1
create table temp id int;
insert into temp select id from your_table a where left (registration_no, ) = '0' and
exists select id from your_table
where a.registration_no = concat ('0', registration_no)
delete from your_table where id in (select id from temp);
drop table temp;
I think you can do this with a single DELETE statement. The JOIN ensures that only duplicates can get deleted, and the constraint limits it further by the registration numbers that don't start with a '0'.
DELETE
r1
FROM
Registration r1
JOIN
Registration r2 ON RIGHT(r1.RegistrationNumber, 8) = r2.RegistrationNumber
WHERE
LEFT(r1.RegistrationNumber, 1) <> '0'
Your table looks like this after running the above DELETE. I tested it on a SQL Server 2008 instance.
ID RegistrationNumber
----------- ------------------
2 483877668
3 019000702
4 837478848
5 464657588
7 042400065
This solution won't depend on the registration numbers being a particular length, it just looks for the ones that are the same integer, yet not the same value (because of the leading zeroes) and selects for the entry that has a '0' as the first character.
DELETE r
FROM Registration AS r
JOIN Registration AS r1 ON r.RegistrationNo = CAST(r1.RegistrationNo AS INT)
AND r.RegistrationNo <> r1.RegistrationNo
WHERE CHARINDEX('0',r.registrationno) = 1