MySQL left outer join the same table multiple times? - mysql

So I have a MySQL dilemma which seemed to be relatively simple, however not the case.
I have two tables: one which holds a list of unique ids to display and another table which lists the ids next to a timestamp.
====== ============================
| ID | | ID | Timestamp |
====== ============================
| 1 | | 1 | 2015-10-10 00:00:00 |
| 2 | | 1 | 2015-10-10 00:10:00 |
| .. | | 2 | 2015-10-10 00:00:00 |
====== ============================
I need to display a boolean if the relevant id has records in Table B between two Date-Times and the last date it was active of all time.
I have tried something similar to this:
SELECT
a.`ID`,
MAX(b1.`Timestamp`) IS NOT NULL as 'Active',
MAX(b2.`Timestamp`) AS 'LastActive'
FROM `Table-A` a
LEFT OUTER JOIN `Table-B` b1
ON a.ID = b1.ID
AND b1.`Timestamp` BETWEEN #startTime AND #endTime
LEFT OUTER JOIN `Table-B` b2
ON a.ID = b2.ID
GROUP BY a.ID
;
Currently not sure why: but the query seems to run infinitely and not get any results. Can anyone suggest the correct way to get the results needed in my query?
EDIT:
Here is an EXPLAIN SELECT for the above query.

Use this
SELECT
a.*,
IF(b1.cnt IS NULL, FALSE, TRUE) AS is_found,
IFNULL(b2.dt, '-') AS max_dt
FROM table1 a
LEFT OUTER JOIN (
SELECT
id,
COUNT(*) AS cnt
FROM table2
WHERE
`timestamp` BETWEEN '2015-01-01' AND '2015-12-31'
GROUP BY 1) b1
ON a.id=b1.id
LEFT OUTER JOIN (
SELECT id,
MAX(TIMESTAMP) AS dt
FROM table2
GROUP BY 1) b2
ON a.id=b2.id

Related

MySQL fill NULL row with value based on the nearest value in another row

I'm trying to update rows in a scores table based on the following logic:
Get the feat_sum for ids that don't have a score.
For each feat_sum that has a NULL score, get the row with the nearest feat_sum and score and then update the score field to that score.
If feat_sum difference is identical, chose the smaller score
id is the PK of the table
Initial table:
scores
| id | feat_sum | score |
| --- | --- | --- |
| 1 | 1.234 | 341 |
| 2 | 5.678 | 758 |
| 3 | 2.234 | NULL |
| 4 | 8.678 | NULL |
Expected output after query:
scores
| id | feat_sum | score |
| --- | --- | --- |
| 1 | 1.234 | 341 |
| 2 | 5.678 | 758 |
| 3 | 2.234 | 341 |
| 4 | 8.678 | 758 |
e.g. 1.234 is closer to 2.234 than 5.678 is to 2.234, therefore, the score for 2.234 should be 341.
I think I've got the base query here, but I'm struggling to put the last bit together.
SELECT
id,
feat_sum,
CASE
WHEN score IS NULL
THEN (SELECT score FROM scores WHERE feat_sum - some_other_feat_sum /* has smallest difference */
END AS score
FROM scores;
In the UPDATE statement join to the table a CTE that returns for each id with a score that is null the closest score by utilizing the window function FIRST_VALUE():
WITH cte AS (
SELECT DISTINCT s1.id,
FIRST_VALUE(s2.score) OVER (PARTITION BY s1.id ORDER BY ABS(s2.feat_sum - s1.feat_sum), s2.score) AS score
FROM scores s1 JOIN scores s2
WHERE s1.score IS NULL AND s2.score IS NOT NULL
)
UPDATE scores s
INNER JOIN cte c ON c.id = s.id
SET s.score = c.score;
See the demo.
I SQL Server, I'd use an APPLY operation, like this:
SELECT s.id, s.feat_sum, COALESCE(s.score, alt.score) as score
FROM scores s
CROSS APPLY (
SELECT TOP 1 score
FROM scores s0
WHERE s0.score IS NOT NULL
ORDER BY abs(s0.feat_sum - s.feat_sum)
) alt
Other databases call this a lateral join. I know MySQL supports this, but the documentation is not clear to me (it only shows the old bad A,B syntax), so this might not be quite right:
SELECT s.id, s.feat_sum, COALESCE(s.score, alt.score) as score
FROM scores s
JOIN LATERAL (
SELECT score
FROM scores s0
WHERE s0.score IS NOT NULL
ORDER BY abs(s0.feat_sum - s.feat_sum)
LIMIT 1
) alt
Most lateral joins that also use a LIMIT 1 can be re-written to run even faster using a windowing function instead. I haven't looked that far ahead yet on this query.
My thought process is as follows,
get the rows with score = NULL
calculate the difference of the rows above with all other rows and rank them based on difference, score
join the rows with least difference with original table to get the score
and finally a left join which conditionally shows the scores.
with noscore as
(
select *
from Table1
where score is null
),
alldiff as
(
select t1.id, t2.id as diffid, abs(t2.feat_sum-t1.feat_sum) as diff, t2.score
from noscore t1
inner join Table1 t2 on
t1.id != t2.id and t2.score is not null
order by diff,score asc
),
diff as
(
select *, row_number() over (partition by id order by diff asc) as nr
from alldiff
),
mindiff as
(
select df.id, t1.feat_sum, t1.score
from diff df
inner join Table1 t1 on df.diffid = t1.id
where df.nr = 1
),
updt as
(
select t1.id, t1.feat_sum, if(t1.score is null,md.score,t1.score) as score
from Table1 t1
left join mindiff md on t1.id = md.id
)
update Table1 t1 inner join updt u on t1.id = u.id set t1.score = u.score;
There may be a scope for query optimisation but I guess this works for now.
Try it out here

Select each row of table except where the id is not the maximum value for a given foreign key

Given a table such as the following called form_letters:
+---------------+----+
| respondent_id | id |
+---------------+----+
| 3 | 1 |
| 7 | 2 |
| 7 | 3 |
+---------------+----+
How can I select each of these rows except the ones that do not have the maximum id value for a given respondent_id.
Example results:
+---------------+----+
| respondent_id | id |
+---------------+----+
| 3 | 1 |
| 7 | 3 |
+---------------+----+
Something like this should work;
SELECT respondent_id, MAX(id) as id FROM form_letters
group by respondent_id
MySQL fiddle:
http://sqlfiddle.com/#!2/5c4dc0/2
There are many ways of doing it. group by using max(), or using not exits and using left join
Here is using left join which is better in terms of performance on indexed columns
select
f1.*
from form_letters f1
left join form_letters f2 on f1.respondent_id = f2.respondent_id
and f1.id < f2.id
where f2.respondent_id is null
Using not exits
select f1.*
from form_letters f1
where not exists
(
select 1 from form_letters f2
where f1.respondent_id = f2.respondent_id
and f1.id < f2.id
)
Demo
Here's how I would do it. Get the max id in a sub query, then join it back to your original table. Next, limit to records where the ID does not equal the max id.
Edit: Opposite of this. limit to records where the ID = MaxID. Code changed below.
Select FL.Respondent_ID, FL.ID, A.Max_ID
From Form_Letters FL
left join (
select Respondent_ID, Max(ID) as Max_ID
from Form_Letters
group by Respondent_ID) A
on FL.Respondent_ID = A.Respondent_ID
where FL.ID = A.Max_ID

Grouping and aggregating with fields that don't need it

I have the following data:
| ID | Date | Code |
--------------------------
| 1 | 26/02/14 | 10 |
| 1 | 25/02/14 | 11 |
| 1 | 24/02/14 | 10 |
| 2 | 25/02/14 | 13 |
| 2 | 24/02/14 | 11 |
| 2 | 23/02/14 | 10 |
All I want is to group by the ID field and return the maximum value from the date field (i.e. most recent). So the final result should look like this:
| ID | Date | Code |
--------------------------
| 1 | 26/02/14 | 10 |
| 2 | 25/02/14 | 13 |
It seems though that if I want the "Code" field showing in the same query I also have to group or aggregate it as well... which makes sense because there could potentially be more than one value left on that field after the others are grouped/aggregated (even though there won't be in this case).
I thought I could handle this problem by doing the GroupBy and Max in a subquery on just those fields and then do a join on that subquery to bring in the "Code" field I don't want grouped or aggregated:
SELECT Q.ID, Q.MaxOfDate, A.Code
FROM
(SELECT B.ID, Max(B.Date) As MaxOfDate
FROM myTable As B
GROUP BY B.ID) As Q
LEFT JOIN myTable As A ON Q.ID = A.ID;
This isn't working though as it is still only giving me the original number of records I started with.
How do you do grouping and aggregation with fields you don't necessarily want grouped/aggregated?
An alternative to the answer I accepted:
SELECT Q.ID, Q.MaxOfDate, A.Code
FROM
(SELECT B.ID, Max(B.Date) As MaxOfDate
FROM myTable As B
GROUP BY B.ID) As Q
LEFT JOIN myTable As A ON (Q.ID = A.ID) AND (A.Date = Q.MaxOfDate);
Needed to do the LEFT JOIN on the Date field as well as the ID field.
If you want the CODE associated with the Max Date, you will have to use a subquery with a top 1, like this:
SELECT B.ID, Max(B.Date) As MaxOfDate,
(select top 1 C.Code
from myTable As C
where B.ID = C.ID
order by C.Date desc, C.Code) as Code
FROM myTable As B
GROUP BY B.ID

Mysql to select rows group by with order by another column

I am trying to select the rows from a table by 'group by' and ignoring the first row got by sorting the data by date. The sorting should be done by a date field, to ignore the newest entry and returning the old ones for the group.
The table looks like
+----+------------+-------------+-----------+
| id | updated on | group_name | list_name |
+----+------------+----------------+--------+
| 1 | 2013-04-03 | g1 | l1 |
| 2 | 2013-03-21 | g2 | l1 |
| 3 | 2013-02-26 | g2 | l1 |
| 4 | 2013-02-21 | g1 | l1 |
| 5 | 2013-02-20 | g1 | l1 |
| 6 | 2013-01-09 | g2 | l2 |
| 7 | 2013-01-10 | g2 | l2 |
| 8 | 2012-12-11 | g1 | l1 |
+----+------------+-------------+-----------+
http://www.sqlfiddle.com/#!2/cec99/1
So, basically, I just want to return ids (3,4,5,6,8) as those are the oldest in the group_name and list_name. Ignoring the latest entry and returning the old ones by grouping it based on group_name and list_name
I am not able to write sql for this problem. I know order by will not work with group by. Please help me in figuring out a solution.
Thanks
And also, is there a way to do this without using subqueries?
Something like the following to get only the rows that are the minimum date for a specific row:
select a.ID, a.updated_on, a.group_name, list_name
from data a
where
a.updated_on <
(
select max(updated_on)
from data
group by group_name having group_name = a.group_name
);
SQL Fiddle: http://www.sqlfiddle.com/#!2/00d43/10
Update (based on your reqs)
select a.ID, a.updated_on, a.group_name, list_name
from data a
where
a.updated_on <
(
select max(updated_on)
from data
group by group_name, list_name having group_name = a.group_name
and list_name = a.list_name
);
See: http://www.sqlfiddle.com/#!2/cec99/3
Update (To not use Correlated Subquery but Simple subquery)
Decided correlated subquery is too slow based on: Subqueries vs joins
So I changed to joining with a aliased temporary table based on nested query.
select a.ID, a.updated_on, a.group_name, a.list_name
from data a,
(
select group_name, list_name , max(updated_on) as MAX_DATE
from data
group by group_name, list_name
) as MAXDATE
where
a.list_name = MAXDATE.list_name AND
a.group_name = MAXDATE.group_name AND
a.updated_on < MAXDATE.MAX_DATE
;
SQL Fiddle: http://www.sqlfiddle.com/#!2/5df64/8
You could try using the following query (yes, it has a nested join, but maybe it helps).
SELECT ID FROM
(select d1.ID FROM data d1 LEFT JOIN
data d2 ON (d1.group_name = d2.group_name AND d1.list_name=d2.list_name AND
d1.updated_on > d2.updated_on) WHERE d2.ID IS NULL) data_tmp;
CORRECTION:
SELECT DISTINCT(ID) FROM
(select d1.* FROM data d1 LEFT JOIN
data d2 ON (d1.group_name = d2.group_name AND d1.list_name=d2.list_name AND
d1.updated_on < d2.updated_on) WHERE d2.ID IS NOT NULL) date_tmp;
SELECT DISTINCT y.id
FROM data x
JOIN data y
ON y.group_name = x.group_name
AND y.list_name = x.list_name
AND y.updated_on < x.updated_on;

MySQL JOIN based on highest date and non-unique columns

I need some help with a MySQL query I'm working on. I have data as follows.
Table 1
id date1 text number
---|------------|--------|-------
1 | 2012-12-12 | hi | 399
2 | 2011-11-11 | so | 399
5 | 2010-10-10 | what | 555
3 | 2009-09-09 | bye | 300
4 | 2008-08-08 | you | 300
Table 2
id number date2 ref
---|--------|------------|----
1 | 399 | 2012-06-06 | 40
2 | 399 | 2011-06-06 | 50
5 | 555 | 2011-03-03 | 60
For each row in Table 1, I want to get zero or one ref values from Table 2. There should be a row in the result for each row in Table 1. The number column isn't unique to either table, so the join must be made using the date1 & date2 columns, where date2 is the highest value for the number without exceeding date1 for that number.
The desired result from the above example would be like so.
date1 text number ref
------------|--------|--------|-----
2012-12-12 | hi | 399 | 40
2011-11-11 | so | 399 | 50
2010-10-10 | what | 555 | null
2009-09-09 | bye | 300 | null
2008-08-08 | you | 300 | null
You can see in the result's first row, ref is 40 was chosen because in table2 the record with ref=40 had a date2 that that was less than date1, and the highest date that met that condition.
In the result's second row, ref is 50 was chosen because in table2 the record with ref=50 had a date2 that that was less than date1, and the highest date that met that condition.
The rest of the results have null refs because date1 is always less or a corresponding number doesn't exist in table2.
I've got to a certain point but I'm stuck. The query I have so far is like this.
SELECT date1, text, number, ref
FROM table1
LEFT JOIN (
SELECT *
FROM (
SELECT *
FROM table2
WHERE date2 <= '2012-12-12'
ORDER BY date2 DESC
) tmp
GROUP BY msisdn
) tmp ON table1.number = table2.number;
The problem is that the hard coded date won't do, it should be based on date1, but I can't use date1 because it's in the outer query. Is there a way I can make this work?
I tried similar example with different tables just now and was able to get what you wanted. Below is a similar query modified to fit your needs. You might want to change < with <= if that is what you are looking for.
SELECT a.date1, a.text, b.ref
FROM table1 a LEFT JOIN table2 b ON
( a.number = b.number
AND a.date1 > b.date2
AND b.date2 = ( SELECT MAX(x.date2)
FROM table2 x
WHERE x.number = b.number
AND x.date2 < a.date1)
)
Untested:
SELECT t1.date1,
t1.text,
t1.number,
(SELECT a.ref
FROM TABLE_2 a
JOIN (SELECT t.number,
MAX(t.date2) AS max_date
FROM TABLE_2 t
WHERE t.number = t1.number
AND t.date2 <= t1.date1
GROUP BY t.number) b ON b.number = a.number
AND b.max_date = a.date2)
FROM TABLE_1 t1
The issue is the use of t1 in the derived table of the subselect...