Joining a second table and selecting the first entry - mysql

I have been having some troubles getting my head around achieving the below.
I have an 'applications' table and a 'application_logs' table. I am attempting to select all the applications where the 'type' is equal to 'test' and then join the 'application_logs' table and retrieve only the first log entry for the application.
One of the queries I tried and understood most was: (whilst this didn't fail it looked like an endless loop and completed the query.
SELECT applications.id FROM applications JOIN application_logs ON application_logs.application_id =
(
SELECT application_logs.id FROM application_logs
WHERE application_logs.application_id = applications.id
ORDER BY id DESC
LIMIT 1
)
WHERE type = 'test';
There were some other queries (using CROSS APPLY/distinct) I attempted but they didn't make sense to me and didn't look like they were trying to achieve the same thing. I appreciate all the help :)

There are many ways to achieve this in standard SQL. A lateral join (CROSS APPLY) is the first to come to mind, but MySQL doesn't support it. Another would be FETCH FIRST ROWS WITH TIES to get all latest application logs, but MySQL doesn't support it (and its counterpart LIMIT doesn't have a ties clause either). If you are only interested in the application ID alone (as shown in your query) one could even combine this with an INTERSECT operation, but MySQL doesn't support it.
After all you want to find out the maximum log ID per application ID. As of MySQL 8 you can do this on-the-fly:
select *
from applications a
left join
(
select
application_logs.*,
max(id) over (partition by application_id) as max_application_id
from application_logs
) al on al.application_id = a.id and al.application_id = al.max_application_id
where a.type = 'test';
In earlier vesions you would do this in separate steps. One way would be this:
select *
from applications a
left join application_logs al
on al.application_id = alm.application_id
and (al.application_id, al.id) in
(
select application_id, max(id)
from application_logs
group by application_id
)
where a.type = 'test';
Another is your own query, where you only got confused with the IDs:
SELECT *
FROM applications a
JOIN application_logs al ON al.id =
(
SELECT almax.id
FROM application_logs almax
WHERE almax.application_id = a.id
ORDER BY almax.id DESC
LIMIT 1
)
WHERE a.type = 'test';

try this
SELECT
a.*, c.*
FROM
applications a
INNER JOIN (
SELECT
al.application_id AS application_id,
MAX(al.id) AS al_id
FROM
application_logs al
GROUP BY
al.application_id
) c ON a.id = c.application_id
WHERE
a.type = 'test';

To get the first entry from application_logs table with respect to application_id. You need to use Row_Number Over (partition by order by ).
SELECT *
FROM applications A
LEFT JOIN (
SELECT Id AS applications.id
FROM (
SELECT ROW_NUMBER() OVER (Partition By applications.id ORDER BY application_logs.id) as R, application_logs.id, applications.id
FROM application_logs
) AS S
WHERE R = 1
) AS L ON L.applications.id = A.applications.id

Try this -
SELECT a.id, ay.*
FROM applications AS a
INNER JOIN (
SELECT al.application_id, min(al.id) as Min_Id
FROM application_logs AS al
GROUP BY al.application_id
) AS ax ON ax.application_id = a.id
INNER JOIN application_logs AS ay ON ay.id = ax.id
WHERE a.type = 'test';

Your query suggests that "first" means the largest id. (Colloquially, I would expect "first" to mean the smallest id or earliest chronologically.)
I usually recommend a correlated subquery for the filtering. It is worth testing if this is faster than other methods:
select . . .
from applications a join
application_logs al
on al.application_id = a.id
where a.type = 'test' and
al.id = (select max(al2.id)
from application_logs al2
where al2.application_id = al.application_id
);
The optimal indexes for performance are:
application(type, id)
application_logs(application_id, id)

Related

How can I merge these two left joins into a single one?

How can I merge these two left joins: http://sqlfiddle.com/#!9/1d2954/69/0
SELECT d.`id`, (adcount + bdcount)
FROM `docs` d
LEFT JOIN
(
SELECT da.`doc_id`, COUNT(da.`doc_id`) AS adcount FROM `docs_scod_a` da
INNER JOIN `scod_a` a ON a.`id` = da.`scod_a_id`
WHERE a.`ver_a` IN ('AA', 'AB')
GROUP BY da.`doc_id`
) ad ON ad.`doc_id` = d.`id`
LEFT JOIN
(
SELECT db.`doc_id`, COUNT(db.`doc_id`) AS bdcount FROM `docs_scod_b` db
INNER JOIN `scod_b` b ON b.`id` = db.`scod_b_id`
WHERE b.`ver_b` IN ('BA', 'BB')
GROUP BY db.`doc_id`
) bd ON bd.`doc_id` = d.`id`
to be a Single left join just to ease its use in my code, while making it no less slower?
Let me first emphasize that your method of doing the calculation is the better method. You have two separate dimensions and aggregating them separately is often the most efficient method for doing the calculation. It is also the most scalable method.
That said, your query should be equivalent to this version:
SELECT d.id,
count(distinct a.id),
count(distinct b.id)
FROM docs d left join
docs_scod_a da
ON da.doc_id = d.id LEFT JOIN
scod_a a
ON a.id = da.scod_a_id AND a.ver_a IN ('AA', 'AB') LEFT JOIN
docs_scod_b db
ON db.doc_id = d.id LEFT JOIN
scod_b b
ON b.id = db.scod_b_id AND b.ver_b IN ('BA', 'BB')
GROUP BY d.id
ORDER BY d.id;
This query is more expensive than it looks, because the COUNT(DISTINCT) incurs additional overhead compared to COUNT().
And here is the SQL Fiddle.
And, because LEFT JOIN can return NULL values, your query is more correctly written as:
SELECT d.`id`, COALESCE(adcount, 0) + COALESCE(bdcount, 0)
If you were having problems with the results, this small change might fix those problems.
Performance may be a big problem, depending on sizes of each table. It appears to be an "inflate-deflate" situation since it first "inflates" the number of rows via JOIN, then "deflates" via GROUP BY. The formulation below avoids inflation-deflation.
But first, if I understand this subquery correctly, this
SELECT da.`doc_id`, COUNT(da.`doc_id`) AS adcount
FROM `docs_scod_a` da
INNER JOIN `scod_a` a ON a.`id` = da.`scod_a_id`
WHERE a.`ver_a` IN ('AA', 'AB')
GROUP BY da.`doc_id`
can be rewritten as
SELECT `doc_id`,
( SELECT COUNT(*)
FROM `scod_a`
WHERE `id` = da.`scod_a_id`
AND `ver_a` IN ('AA', 'AB')
) AS adcount
FROM `docs_scod_a` AS da
If that is correct, then the entire query becomes
SELECT d.id,
( SELECT COUNT(*)
FROM docs_scod_a ds
JOIN scod_a s ON s.id = ds.scod_a_id
WHERE ds.doc_id = d.id
AND s.ver_a IN ('AA', 'AB')
) +
( SELECT COUNT(*)
FROM docs_scod_b ds
JOIN scod_b s ON s.id = ds.scod_b_id
WHERE ds.doc_id = d.id
AND s.ver_b IN ('BA', 'BB')
)
FROM docs AS d
Which needs these indexes:
docs_scod_a: (doc_id, scod_a_id), (scod_a_id, doc_id)
docs_scod_b: (doc_id, scod_b_id), (scod_b_id, doc_id)
scod_a: (ver_a, id)
scod_b: (ver_b, id)
docs: -- presumably has PRIMARY KEY(id)
Note the lack of GROUP BY.
docs_scod_a smells like a many-to-many mapping table. I recommend you follow the tips here.
(No COALESCE is needed since COUNT will simply return zero.)
(I don't know whether my version is better (faster or whatever) than Gordon's, nor whether my indexes will help his formulation.)

How to join latest record for each foreign key without Inner select using group by and then on clause?

I have two tables r_instance(id as primary key,name,user_id,..etc) and r_response(id,comment,r_instance_id as Foreign key).
Each r_instance row have multiple r_response rows(say min of 3).
I want to get latest id and comment while joining r_response with r_instance.
But without using GROUP BY and then on clause on r_response as it is degrading query performance.So When query performance is considered using EXPLAIN the type column should not have ALL value.
My query is :
SELECT ri.id, ri.name, rr.id, rr.comment
FROM r_instance ri
JOIN (SELECT MAX(id) maxResponseId, r_instance_id instanceId
from r_response
GROUP BY r_instance_id) lastRes ON lastRes.instanceId = ri.id
JOIN r_response rr ON rr.id = lastRes.maxResponseId
You could use window function called row_number() MySQL 8.0+
SELECT * FROM
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY r_instance_id ORDER BY id DESC) Sq
from r_response
) a
INNER JOIN r_instance R ON R.id = a.r_instance_id
WHERE a.Sq = 1
Here is another method:
SELECT ri.id, ri.name, rr.id, rr.comment
FROM r_instance ri JOIN
JOIN r_response rr
ON ri.id = rr.r_instance_id
WHERE rr.id = (SELECT MAX(rr2.id)
FROM r_response rr2
WHERE rr2.r_instance_id = rr.r_instance_id
);
For performance, you want an index on r_response(r_instance_id, id).
I should note that this does not always give the best performance. It is another strategy for expressing the same logic, resulting in a different execution plan. It might result in better performance.

Left join sql query

I want to get all the data from the users table & the last record associated with him from my connection_history table , it's working only when i don't add at the end of my query
ORDER BY contributions DESC
( When i add it , i have only the record wich come from users and not the last connection_history record)
My question is : how i can get the entires data ordered by contributions DESC
SELECT * FROM users LEFT JOIN connections_history ch ON users.id = ch.guid
AND EXISTS (SELECT 1
FROM connections_history ch1
WHERE ch.guid = ch1.guid
HAVING Max(ch1.date) = ch.date)
The order by should not affect the results that are returned. It only changes the ordering. You are probably getting what you want, just in an unexpected order. For instance, your query interface might be returning a fixed number of rows. Changing the order of the rows could make it look like the result set is different.
I will say that I find = to be more intuitive than EXISTS for this purpose:
SELECT *
FROM users u LEFT JOIN
connections_history ch
ON u.id = ch.guid AND
ch.date = (SELECT Max(ch1.date)
FROM connections_history ch1
WHERE ch.guid = ch1.guid
)
ORDER BY contributions DESC;
The reason is that the = is directly in the ON clause, so it is clear what the relationship between the tables is.
For your casual consideration, a different formatting of the original code. Note in particular the indented AND suggests the clause is part of the LEFT JOIN, which it is.
SELECT * FROM users
LEFT JOIN connections_history ch ON
users.id = ch.guid
AND EXISTS (SELECT 1
FROM connections_history ch1
WHERE ch.guid = ch1.guid
HAVING Max(ch1.date) = ch.date
)
We can use nested queries to first check for max_date for a given user and pass the list of guid to the nested query assuming all the users has at least one record in the connection history table otherwise you could use Left Join instead.
select B.*,X.* from users B JOIN (
select A.* from connection_history A
where A.guid = B.guid and A.date = (
select max(date) from connection_history where guid = B.guid) )X on
X.guid = B.guid
order by B.contributions DESC;

How to optimize this complected query?

While working with following query on mysql, Its getting locked,
SELECT event_list.*
FROM event_list
INNER JOIN members
ON members.profilenam=event_list.even_loc
WHERE (even_own IN (SELECT frd_id
FROM network
WHERE mem_id='911'
GROUP BY frd_id)
OR even_own = '911' )
AND event_list.even_active = 'y'
GROUP BY event_list.even_id
ORDER BY event_list.even_stat ASC
The Inner query inside IN constraint has many frd_id, So because of that above query is slooow..., So please help.
Thanks.
Try this:
SELECT el.*
FROM event_list el
INNER JOIN members m ON m.profilenam = el.even_loc
WHERE el.even_active = 'y' AND
(el.even_own = 911 OR EXISTS (SELECT 1 FROM network n WHERE n.mem_id=911 AND n.frd_id = el.even_own))
GROUP BY el.even_id
ORDER BY el.even_stat ASC
You don't need the GROUP BY on the inner query, that will be making the database engine do a lot of unneeded work.
If you put even_own = '911' before the select from network, then if even_own IS 911 then it will not have to do the subquery.
Also why do you have a group by on the subquery?
Also run explain plan top find out what is taking the time.
This might work better:
( SELECT e.*
FROM event_list AS e
INNER JOIN members AS m ON m.profilenam = e.even_loc
JOIN network AS n ON e.even_own = n.frd_id
WHERE n.mem_id = '911'
AND e.even_active = 'y'
ORDER BY e.even_stat ASC )
UNION DISTINCT
( SELECT e.*
FROM event_list AS e
INNER JOIN members AS m ON m.profilenam = e.even_loc
WHERE e.even_own = '911'
AND e.even_active = 'y' )
ORDER BY e.even_stat ASC
Since I don't know whether the JOINs one-to-many (or what), I threw in DISTINCT to avoid dups. There may be a better way, or it may be unnecessary (that is, UNION ALL).
Notice how I avoid two things that are performance killers:
OR -- turned into UNION
IN (SELECT...) -- turned into JOIN.
I made aliases to cut down on the clutter. I moved the ORDER BY outside the UNION (and added parens to make it work right).

MAX() Function not working as expected

I've created sqlfiddle to try and get my head around this http://sqlfiddle.com/#!2/21e72/1
In the query, I have put a max() on the compiled_date column but the recommendation column is still coming through incorrect - I'm assuming that a select statement will need to be inserted on line 3 somehow?
I've tried the examples provided by the commenters below but I think I just need to understand this from a basic query to begin with.
As others have pointed out, the issue is that some of the select columns are neither aggregated nor used in the group by clause. Most DBMSs won't allow this at all, but MySQL is a little relaxed on some of the standards...
So, you need to first find the max(compiled_date) for each case, then find the recommendation that goes with it.
select r.case_number, r.compiled_date, r.recommendation
from reporting r
join (
SELECT case_number, max(compiled_date) as lastDate
from reporting
group by case_number
) s on r.case_number=s.case_number
and r.compiled_date=s.lastDate
Thank you for providing sqlFiddle. But only reporting data is given. we highly appreciate if you give us sample data of whole tables.
Anyway, Could you try this?
SELECT
`case`.number,
staff.staff_name AS ``case` owner`,
client.client_name,
`case`.address,
x.mx_date,
report.recommendation
FROM
`case` INNER JOIN (
SELECT case_number, MAX(compiled_date) as mx_date
FROM report
GROUP BY case_number
) x ON x.case_number = `case`.number
INNER JOIN report ON x.case_number = report.case_number AND report.compiled_date = x.mx_date
INNER JOIN client ON `case`.client_number = client.client_number
INNER JOIN staff ON `case`.staff_number = staff.staff_number
WHERE
`case`.active = 1
AND staff.staff_name = 'bob'
ORDER BY
`case`.number ASC;
Check below query:
SELECT c.number, s.staff_name AS `case owner`, cl.client_name,
c.address, MAX(r.compiled_date), r.recommendation
FROM case c
INNER JOIN (SELECT r.case_number, r.compiled_date, r.recommendation
FROM report r ORDER BY r.case_number, r.compiled_date DESC
) r ON r.case_number = c.number
INNER JOIN client cl ON c.client_number = cl.client_number
INNER JOIN staff s ON c.staff_number = s.staff_number
WHERE c.active = 1 AND s.staff_name = 'bob'
GROUP BY c.number
ORDER BY c.number ASC
SELECT
case.number,
staff.staff_name AS `case owner`,
client.client_name,
case.address,
(select MAX(compiled_date)from report where case_number=case.number),
report.recommendation
FROM
case
INNER JOIN report ON report.case_number = case.number
INNER JOIN client ON case.client_number = client.client_number
INNER JOIN staff ON case.staff_number = staff.staff_number
WHERE
case.active = 1 AND
staff.staff_name = 'bob'
GROUP BY
case.number
ORDER BY
case.number ASC
try this