Inner-Join on temporary columns using MAX() - mysql

I have the table t in MariaDB (latest), which includes among others the columns person_ID, date_1, date_2. They contain person IDs and string dates respectively. For each ID there is only one date_1, but multiple date_2. Rows either have a date_1 or date_2 that is why I am joining on ID. Here is an example of the table t:
person_ID
date_1
date_2
A
-
3
A
-
5
A
1
-
B
-
10
B
-
14
B
5
-
C
-
11
C
-
9
C
7
-
Create and fill table t:
CREATE TABLE t(
id SERIAL,
person_ID TEXT,
date_1 TEXT,
date_2 TEXT,
PRIMARY KEY (id)
);
INSERT INTO t (person_ID, date_2) VALUES ('A', 3);
INSERT INTO t (person_ID, date_2) VALUES ('A', 5);
INSERT INTO t (person_ID, date_1) VALUES ('A', 1);
INSERT INTO t (person_ID, date_2) VALUES ('B', 10);
INSERT INTO t (person_ID, date_2) VALUES ('B', 14);
INSERT INTO t (person_ID, date_1) VALUES ('B', 5);
INSERT INTO t (person_ID, date_2) VALUES ('C', 11);
INSERT INTO t (person_ID, date_2) VALUES ('C', 9);
INSERT INTO t (person_ID, date_1) VALUES ('C', 7);
SET GLOBAL sql_mode=(SELECT REPLACE(##sql_mode,'ONLY_FULL_GROUP_BY',''));
The following is an inner-join of two subqueries A and B. Query A gives a distinct list of person_IDs, which contain a date_1, and date_1 itself. On the other hand query B should give a distinct list of person_IDs that contain a date_2, and MAX(date_2).
SELECT A.person_ID, A.date_A, B.date_B, B.date_B - A.date_A AS diff FROM
(SELECT person_ID, date_1 AS date_A FROM t
WHERE date_1 >= 0) A
INNER JOIN
(SELECT person_ID, MAX(date_2) AS date_B FROM t
WHERE date_2 >= 0
GROUP BY person_ID) B
ON A.person_ID = B.person_ID
AND B.date_B > A.date_A
AND (B.date_B - A.date_A) <= 7
GROUP BY A.person_ID;
That gives the output:
person ID
date_A
date_B
diff
A
1
5
4
C
7
9
2
But this would be the desired outcome (ignoring ID = B, because diff = 9):
person ID
date_A
date_B
diff
A
1
5
4
C
7
11
4
I assume MAX(date_2) gives 9 instead of 11 for person_ID = C, because that value was inserted last for date_2.
You can use this link to try it out yourself.

This problem is made harder by your sparse table (rows with NULLs). Here's how I would approach this.
Start with a subquery to clean up the sparse table. It generates a result set where the rows with nulls are removed, generating a result like this.
person_ID date_1 date_2 diff
A 1 3 2
A 1 5 4
B 5 10 5
B 5 14 9
C 7 11 4
C 7 9 2
This puts the single date_1 value for each person into the rows with the date_2 values. The query to do that is:
SELECT t.person_ID, b.date_1, t.date_2, t.date_2 - b.date_1 diff
FROM t
JOIN t b ON t.person_ID = b.person_ID
AND b.date_1 IS NOT NULL
AND t.date_2 IS NOT NULL
Let's name the output of that subquery with the alias detail.
Your business logic calls for the very common greatest-n-per-group query pattern. It calls for retrieving the row with the largest diff for each person_ID, as long as diff <= 7. With that detail subquery we can write your logic a little more easily. In your result set you want the row for each person_ID that shows date_1, date_2, and diff for the largest diff, but leaving out any rows with a diff > 7.
First, write another subquery that finds the largest qualifying diff value for each person_ID.
SELECT person_ID, MAX(diff) diff
FROM detail
GROUP BY person_ID
HAVING MAX(diff) <= 7
Then join that subquery to the detail to get your desired result set.
SELECT detail.*
FROM detail
JOIN ( SELECT person_ID, MAX(diff) diff
FROM detail
GROUP BY person_ID
HAVING MAX(diff) <= 7
) md ON detail.person_ID = md.person_ID
AND detail.diff = md.diff
Now, I used a common table expression to write this query: to define the detail. That syntax is available in MariaDB 10.2+ (and MySQL 8+). Putting it together, here is the query.
WITH detail AS
(SELECT t.person_ID, b.date_1, t.date_2, t.date_2 - b.date_1 diff
FROM t
JOIN t b ON t.person_ID = b.person_ID
AND b.date_1 IS NOT NULL
AND t.date_2 IS NOT NULL
)
SELECT detail.*
FROM detail
JOIN ( SELECT person_ID, MAX(diff) diff
FROM detail
GROUP BY person_ID
HAVING MAX(diff) <= 7
) md ON detail.person_ID = md.person_ID
AND detail.diff = md.diff
Summary: the steps to solving your problem.
Deal with the sparse-data problem in your input table ... get rid of the input rows with NULL values by filling in date_1 values in the rows that have date_2 values. And, throw in the diffs.
Find the largest eligible diff values for each person_ID.
Join that list of largest diffs back into the detail table to extract the correct row of the detail table.
Pro tip Don't turn off ONLY_FULL_GROUP_BY. You don't want to rely on MySQL / MariaDB's strange nonstandard extension to GROUP BY, because it sometimes yields the wrong values. When it does that it's baffling.

Related

Mysql: How to join a query to find results from another table

I have two tables:
TABLE A
Unique_id
id
price
1
1
10.50
2
3
14.70
3
1
12.44
TABLE B
Unique_id
Date
Category
Store
Cost
1
2022/03/12
Shoes
A
13.24
2
2022/04/15
Hats
A
15.24
3
2021/11/03
Shoes
B
22.31
4
2000/12/14
Shoes
A
15.33
I need to filter TABLE A on a known id to get the Unique_id and average price to join to Table B.
Using this information I need to know which stores this item was sold in.
I then need to create a results table displaying the stores and the amount of days sales were recorded in the stores - regardless of whether the sales are associated with the id and the average cost.
To put it more simply I can break down the task into 2 separate commands:
SELECT AVG(price)
FROM table_a
WHERE id = 1
GROUP BY unique_id;
SELECT store, COUNT(date), AVG(cost)
FROM table_b
WHERE category = 'Shoes'
GROUP BY store;
The unique_id should inform the join but when I join the tables it messes up my COUNT function and only counts the days in which the id is connected - not the total store sales days.
The results should look something like this:
Store
AVG price
COUNT days
AVG cost
A
10.50.
3
14.60.
B
12.44
1.
22.31.
I wwas hard to grasp, what you wanted, but after some thinking and your clarification, it can be solved as the code shows
CREATE TABLE TableA
(`Unique_id` int, `id` int, `price` DECIMAL(10,2))
;
INSERT INTO TableA
(`Unique_id`, `id`, `price`)
VALUES
(1, 1, 10.50),
(2, 3, 14.70),
(3, 1, 12.44)
;
CREATE TABLE TableB
(`Unique_id` int, `Date` datetime, `Category` varchar(5), `Store` varchar(1), `Cost` DECIMAL(10,2))
;
INSERT INTO TableB
(`Unique_id`, `Date`, `Category`, `Store`, `Cost`)
VALUES
(1, '2022-03-12 01:00:00', 'Shoes', 'A', 13.24),
(2, '2022-04-15 02:00:00', 'Hats', 'A', 15.24),
(3, '2021-11-03 01:00:00', 'Shoes', 'B', 22.31),
(4, '2000-12-14 01:00:00', 'Shoes', 'A', 15.33)
SELECT
B.`Store`
, AVG(A.`price`) price
, (SELECT COUNT(*) FROM TableB WHERE `Store` = B.`Store` ) count_
, (SELECT AVG(
`cost`) FROM TableB WHERE `Store` = B.`Store` ) price
FROM TableA A
JOIN TableB B ON A.`Unique_id` = B.`Unique_id`
WHERE B.`Category` = 'Shoes'
GROUP BY B.`Store`
Store | price | count_ | price
:---- | --------: | -----: | --------:
A | 10.500000 | 3 | 14.603333
B | 12.440000 | 1 | 22.310000
db<>fiddle here
This should be the query you are after. Mainly you simply join the rows using an outer join, because not every table_b row has a match in table_a.
Then, the only hindrance is that you only want to consider shoes in your average price. For this to happen you use conditional aggregation (a CASE expression inside the aggregation function).
select
b.store,
avg(case when b.category = 'Shoes' then a.price end) as avg_shoe_price,
count(b.unique_id) as count_b_rows,
avg(b.cost) as avg_cost
from table_b b
left outer join table_a a on a.unique_id = b.unique_id
group by b.store
order by b.store;
I must admit, it took me ages to understand what you want and where these numbers result from. The main reason for this is that you have WHERE table_a.id = 1 in your query, but this must not be applied to get the result you are showing. Next time please look to it that your description, queries and sample data match.
(And then, I think that names like table_a, table_b and unique_id don't help understanding this. If table_a were called prices instead and table_b costs and unique_id were called cost_id then, I wouldn't have had to wonder how the tables are related (by id? by unique id?) and wouldn't have had to look again and again which table the cost resides in, which table has a price and which table is the outer joined one while looking at the problem, the requested result and while writing my query.)

Join Two tables but replace overlapping data with second tables

Hello I have an issue I am working on for a theoretical problem. Assume I have these two tables
Order Table
Entry
Order#
DatePlaced
Type
2001
5
2021-05-03
C
Status Table
Entry
Order#
Status
Date
Deleted
2001
5
S
2021-05-04
0
2002
5
D
2021-05-05
0
So I need to be able to get this
Expected Table
Entry
Order#
DatePlaced
Type
Status
Date
Deleted
2002
5
2021-05-03
C
D
2021-05-05
0
This would be fairly easy if I could just left join the data. The is issue is that the sql in the code is already written like this. The tables are joined based on the entry. Every time a new status occurs for an order# the entry in the Order Table is updated EXCEPT when it is delivered. Do to how dependent the code is I cannot simply update the initial query below. I was wondering if there is a join or way without using SET that I can get the last status based on the order? I was thinking we can check the order and then the entry but I am not sure how to join that with the Current Table (data we get from query)
SELECT * FROM orders or
LEFT JOIN status st ON or.entry = st.entry
WHERE st.deleted = 0;
This results in this
Current Table
Entry
Order#
DatePlaced
Type
Status
Date
Deleted
2001
5
2021-05-03
C
S
2021-05-04
0
Is there a way to JOIN the status table with the Current Table so that the status columns become what I expect?
This will work just fine:
SELECT s.entry, s.order_no, o.date_placed, o.type, s.status, s.date, s.deleted
FROM `orders` o
INNER JOIN `status` s ON (
s.order_no=o.order_no AND s.entry=(SELECT MAX(entry) FROM status WHERE order_no = o.order_no)
)
Live Demo
https://www.db-fiddle.com/f/twz1TT9VH7YNTY1KrpRAjx/3
Does the last status have a higher entry number or higher date created?
Perhaps include MAX(st.Entry) as last_entry in your SELECT clause,
Maybe select your fields explicitly
vs SELECT *
and include a
GROUP BY
after your WHERE clause
and a
HAVING
after your GROUP BY
create table orders (
entry INT,
order_number INT,
date_placed date,
order_type VARCHAR(1) )
create table order_status (
entry INT,
order_number INT,
order_status VARCHAR(1),
date_created date,
deleted INT
);
INSERT INTO orders (entry, order_number, date_placed, order_type) VALUES (2001, 5, '2021-05-03', 'C');
INSERT INTO order_status (entry, order_number, order_status, date_created, deleted)
VALUES
(2001, 5, 'S', '2001-05-04', 0),
(2002, 5, 'D', '2001-05-05', 0);
SELECT os.entry, o.order_number, o.date_placed, o.order_type,
os.order_status, os.date_created, os.deleted,
MAX(os.entry) as last_entry
FROM orders o
LEFT JOIN order_status os
ON o.order_number = os.order_number
GROUP BY o.order_number
HAVING os.entry = last_entry

SQL How to group by two columns

Bellow is an example table.
ID FROM TO DATE
1 Number1 Number2 somedate
2 Number2 Number1 somedate
3 Number2 Number1 somedate
4 Number3 Number1 somedate
5 Number3 Number2 somedate
Expected result is to get 1 row for each unique pair of TO and FROM columns
Example result if ordered by ID ASC
(1,Number1,Number2)
(4,Number3,Number1)
(5,Number3,Number2)
Ok I have found how to do this with the following query
SELECT * FROM table GROUP BY LEAST(to,from), GREATEST(to,from)
However I am not able to get the most recent record for every unique pair.
I have tried with order by ID desc but it returns the first found row for unique pair.
SQL fiddle isn't working for some reason so in the mean time you will need to help me to help you.
Assuming that the following statement works
SELECT
LEAST(to,from) as LowVal,
GREATEST(to,from) as HighVal,
MAX(date) as MaxDate
FROM table
GROUP BY LEAST(to,from), GREATEST(to,from)
then you could join to that as
select t.*
from
table t
inner join
(SELECT
LEAST(to,from) as LowVal,
GREATEST(to,from) as HighVal,
MAX(date) as MaxDate
FROM table
GROUP BY LEAST(to,from), GREATEST(to,from)
) v
on t.date = v.MaxDate
and (t.From = v.LowVal or t.From = v.HighVal)
and (t.To = v.LowVal or t.To= v.HighVal)
I believe the following would work, my knowledge is with Microsoft SQL Server, not MySQL. If MySQL lacks one of these, let me know and I'll delete the answer.
DECLARE #Table1 TABLE(
ID int,
Too varchar(10),
Fromm varchar(10),
Compared int)
INSERT INTO #Table1 values (1, 'John','Mary', 2), (2,'John', 'Mary', 1), (3,'Sue','Charles',1), (4,'Mary','John',3)
SELECT ID, Too, Fromm, Compared
FROM #Table1 as t
INNER JOIN
(
SELECT
CASE WHEN Too < Fromm THEN Too+Fromm
ELSE Fromm+Too
END as orderedValues, MIN(compared) as minComp
FROM #Table1
GROUP BY CASE WHEN Too < Fromm THEN Too+Fromm
ELSE Fromm+Too
END
) ordered ON
ordered.minComp = t.Compared
AND ordered.orderedValues =
CASE
WHEN Too < Fromm
THEN Too+Fromm
ELSE
Fromm+Too
END
I used an int instead of time value, but it would work the same. It's dirty, but it's giving me the results I expected.
The basics of it, is to use a derived query where you take the two columns you want to get unique values for and use a case statement to combine them into a standard format. In this case, earlier alphabetical concatenated with the later value alphabetically. Use that value to get the minimum value we are looking for, join back to the original table to get the values separated out again plus whatever else is in that table. It is assuming the value we are aggregating is going to be unique, so in this case if there was (1, 'John', 'Mary', 2) and (2, 'Mary', 'John', 2), it would kind of break and return 2 records for that couple.
This answer was originally inspired by Get records with max value for each group of grouped SQL results
but then I looked further and came up with the correct solution.
CREATE TABLE T
(`id` int, `from` varchar(7), `to` varchar(7), `somedate` datetime)
;
INSERT INTO T
(`id`, `from`, `to`, `somedate`)
VALUES
(1, 'Number1', 'Number2', '2015-01-01 00:00:00'),
(2, 'Number2', 'Number1', '2015-01-02 00:00:00'),
(3, 'Number2', 'Number1', '2015-01-03 00:00:00'),
(4, 'Number3', 'Number1', '2015-01-04 00:00:00'),
(5, 'Number3', 'Number2', '2015-01-05 00:00:00');
Tested on MySQL 5.6.19
SELECT *
FROM
(
SELECT *
FROM T
ORDER BY LEAST(`to`,`from`), GREATEST(`to`,`from`), somedate DESC
) X
GROUP BY LEAST(`to`,`from`), GREATEST(`to`,`from`)
Result set
id from to somedate
3 Number2 Number1 2015-01-03
4 Number3 Number1 2015-01-04
5 Number3 Number2 2015-01-05
But, this relies on some shady behavior of MySQL, which will be changed in future versions. MySQL 5.7 rejects this query because the columns in the SELECT clause are not functionally dependent on the GROUP BY columns. If it is configured to accept it (ONLY_FULL_GROUP_BY is disabled), it works like the previous versions, but still it is not guaranteed: "The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate."
So, the correct answer seems to be this:
SELECT T.*
FROM
T
INNER JOIN
(
SELECT
LEAST(`to`,`from`) AS LowVal,
GREATEST(`to`,`from`) AS HighVal,
MAX(somedate) AS MaxDate
FROM T
GROUP BY LEAST(`to`,`from`), GREATEST(`to`,`from`)
) v
ON T.somedate = v.MaxDate
AND (T.From = v.LowVal OR T.From = v.HighVal)
AND (T.To = v.LowVal OR T.To = v.HighVal)
Result set is the same as above, but in this case it is guaranteed to stay like this, while before you could easily get different date and id for row Number2, Number1, depending on what indexes you have on the table.
It will work as expected until you have two rows in the original data that have exactly the same somedate and to and from.
Let's add another row:
INSERT INTO T (`id`, `from`, `to`, `somedate`)
VALUES (6, 'Number1', 'Number2', '2015-01-03 00:00:00');
The query above would return two rows for 2015-01-03:
id from to somedate
3 Number2 Number1 2015-01-03
6 Number1 Number2 2015-01-03
4 Number3 Number1 2015-01-04
5 Number3 Number2 2015-01-05
To fix this we need a method to choose only one row in the group. In this example we can use unique ID to break the tie. If there are more than one rows in the group with the same maximum date we will choose the row with the largest ID.
The inner-most sub-query called Groups simply returns all groups, like original query in the question. Then we add one column id to this result set, and we choose id that belongs to the same group and has highest somedate and then highest id, which is done by ORDER BY and LIMIT. This sub-query is called GroupsWithIDs. Once we have all groups and an id of the correct row for each group we join this to the original table to fetch the rest of the column for found ids.
final query
SELECT T.*
FROM
(
SELECT
Groups.N1
,Groups.N2
,
(
SELECT T.id
FROM T
WHERE
LEAST(`to`,`from`) = Groups.N1 AND
GREATEST(`to`,`from`) = Groups.N2
ORDER BY T.somedate DESC, T.id DESC
LIMIT 1
) AS id
FROM
(
SELECT LEAST(`to`,`from`) AS N1, GREATEST(`to`,`from`) AS N2
FROM T
GROUP BY LEAST(`to`,`from`), GREATEST(`to`,`from`)
) AS Groups
) AS GroupsWithIDs
INNER JOIN T ON T.id = GroupsWithIDs.id
final result set
id from to somedate
4 Number3 Number1 2015-01-04
5 Number3 Number2 2015-01-05
6 Number1 Number2 2015-01-03

Convert columns into rows with inner join in mysql

Please take a look at this fiddle.
I'm working on a search filter select box and I want to insert the field names of a table as rows.
Here's the table schemea:
CREATE TABLE general
(`ID` int, `letter` varchar(21), `double-letters` varchar(21))
;
INSERT INTO general
(`ID`,`letter`,`double-letters`)
VALUES
(1, 'A','BB'),
(2, 'A','CC'),
(3, 'C','BB'),
(4, 'D','DD'),
(5, 'D','EE'),
(6, 'F','TT'),
(7, 'G','UU'),
(8, 'G','ZZ'),
(9, 'I','UU')
;
CREATE TABLE options
(`ID` int, `options` varchar(15))
;
INSERT INTO options
(`ID`,`options`)
VALUES
(1, 'letter'),
(2, 'double-letters')
;
The ID field in options table acts as a foreign key, and I want to get an output like the following and insert into a new table:
id field value
1 1 A
2 1 C
3 1 D
4 1 F
5 1 G
6 1 I
7 2 BB
8 2 CC
9 2 DD
10 2 EE
11 2 TT
12 2 UU
13 2 ZZ
My failed attempt:
select DISTINCT(a.letter),'letter' AS field
from general a
INNER JOIN
options b ON b.options = field
union all
select DISTINCT(a.double-letters), 'double-letters' AS field
from general a
INNER JOIN
options b ON b.options = field
Pretty sure you want this:
select distinct a.letter, 'letter' AS field
from general a
cross JOIN options b
where b.options = 'letter'
union all
select distinct a.`double-letters`, 'double-letters' AS field
from general a
cross JOIN options b
where b.options = 'double-letters'
Fiddle: http://sqlfiddle.com/#!2/bbf0b/18/0
A couple to things to point out, you can't join on a column alias. Because that column you're aliasing is a literal that you're selecting you can specify that literal as criteria in the WHERE clause.
You're not really joining on anything between GENERAL and OPTIONS, so what you really want is a CROSS JOIN; the criteria that you're putting into the ON clause actually belongs in the WHERE clause.
I just made this query on Oracle.
It works and produces the output you described :
SELECT ID, CASE WHEN LENGTH(VALUE)=2THEN 2 ELSE 1 END AS FIELD, VALUE
FROM (
SELECT rownum AS ID, letter AS VALUE FROM (SELECT DISTINCT letter FROM general ORDER BY letter)
UNION
SELECT (SELECT COUNT(DISTINCT LETTER) FROM general) +rownum AS ID, double_letters AS VALUE
FROM (
SELECT DISTINCT double_letters FROM general ORDER BY double_letters)
)
It should also run on Mysql.
I did not used the options table. I do not understand his role. And for this example, and this type of output it seems unnecessary
Hope this could help you to.

Need Help on .... Select Where Date NOT BETWEEN

Hope someone can tell ..
Table A Table E
Id | Date Id | Start_date | End_date
1 2012-12-10 1 2012-12-09 2012-12-10
2 2012-12-11 2 2012-12-12 2012-12-14
The Result that I'm hoping ..
2012-12-11
This is the code that I think might work to select date from Table A that not in Table E ranga date...
SELECT * FROM `A`
WHERE `A`.`DATE` NOT BETWEEN (SELECT `E`.`DATE_START` FROM `E`) AND (SELECT `E`.`DATE_END`
FROM `E`);
but unfortunately not, the subquery return more than 1 row.
I wonder how??
thanks
You wonder how the subquery returned more than one row? That's because there's more than one row in the table matching your query.
If you want one row, you'll need to limit the query a little more, such as with:
select `e`.`date_start` from `e` where `e`.`id` = 1
If you want all dates in A that are not contained in any date range in E, one way to do it is to get a list of the A dates that are contained within a range, and then get a list of dates from A that aren't in that list.
Something like:
select date
from a
where date not in (
select a.date
from a, e
where a.date between e.start_date and e.end_date
)
Putting this through the excellent phpMyAdmin demo site as:
create table a (id int, d date);
create table e (id int, sd date, ed date);
insert into a (id, d) values (1, '2012-12-10');
insert into a (id, d) values (2, '2012-12-11');
insert into e (id, sd, ed) values (3, '2012-12-09', '2012-12-10');
insert into e (id, sd, ed) values (4, '2012-12-12', '2012-12-14');
select d from a where d not in (
select a.d from a, e where a.d between e.sd and e.ed
);
results in the output:
2012-12-11
as desired.
To get all records from A that are not inside any of the date ranges in E, get the records that are within the date ranges, and select the ones not in that result:
select *
from A
where Id not in (
select A.Id
from A
inner join E on A.Date between E.Start_date and E.End_date
)
If the Id in table A is the same as the Id in table E :
SELECT *
FROM A, E
WHERE A.Id = E.Id
AND A.Date NOT BETWEEN E.Start_Date AND E.End_Date
What you're looking for here is the set of records in A where there does not exist a record in B for which the date in A is between the begin and end dates in B.
Therefore I'd suggest that you structure the query in that way.
Something like ...
Select ...
From table_A
Where not exists (
Select null
From table_b
Where ...)
Depending on the join cardinality of the tables and their sizes you may find that this performs better than the "find the rows that are not in the set for which a John exists" method, aside from it being a more intuitive match to your logic.