MySQL Query limiting results by sub table - mysql

I'm really struggling with this query and I hope somebody can help.
I am querying across multiple tables to get the dataset that I require. The following query is an anonymised version:
SELECT main_table.id,
sub_table_1.field_1,
main_table.field_1,
main_table.field_2,
main_table.field_3,
main_table.field_4,
main_table.field_5,
main_table.field_6,
main_table.field_7,
sub_table_2.field_1,
sub_table_2.field_2,
sub_table_2.field_3,
sub_table_3.field_1,
sub_table_4.field_1,
sub_table_4.field_2
FROM main_table
INNER JOIN sub_table_4 ON sub_table_4.id = main_table.id
INNER JOIN sub_table_2 ON sub_table_2.id = main_table.id
INNER JOIN sub_table_3 ON sub_table_3.id = main_table.id
INNER JOIN sub_table_1 ON sub_table_1.id = main_table.id
WHERE sub_table_4.field_1 = '' AND sub_table_4.field_2 = '0' AND sub_table_2.field_1 != ''
The query works, the problem I have is sub_table_1 has a revision number (int 11). Currently I get duplicate records with different revision numbers and different versions of sub_table_1.field_1 which is to be expected, but I want to limit the result set to only include results limited by the latest revision number, giving me only the latest sub_table_1_field_1 and I really can not figure it out!
Can anybody lend me a hand?
Many Thanks.

It's always important to remember that a JOIN can be on a subquery as well as a table. You could build a subquery that returns the results you want to see then, once you've got the data you want, join it in the parent query.
It's hard to 'tailor' an answer that's specific to you problem, as it's too obfuscated (as you admit) to know what the data and tables really look like, but as an example:
Say table1 has four fields: id, revision_no, name and stuff. You want to return a distinct list of name values, with their latest version of stuff (which, we'll pretend varies by revision). You could do this in isolation as:
select t.* from table1 t
inner join
(SELECT name, max(revision_no) maxr
FROM table1
GROUP BY name) mx
on mx.name = t.name
and mx.maxr = t.revision_no;
(Note: see fiddle at the end)
That would return each individual name with the latest revision of stuff.
Once you've got that nailed down, you could then swap out
INNER JOIN sub_table_1 ON sub_table_1.id = main_table.id
....with....
INNER JOIN (select t.* from table1 t
inner join
(SELECT name, max(revision_no) maxr
FROM table1
GROUP BY name) mx
on mx.name = t.name
and mx.maxr = t.revision_no) sub_table_1
ON sub_table_1.id = main_table.id
...which would allow a join with a recordset that is more tailored to that which you want to join (again, don't get hung up on the actual query I've used, it's just there to demonstrate the method).
There may well be more elegant ways to achieve this, but it's sometimes good to start with a simple solution that's easier to replicate, then simplify it once you've got the general understanding of the what and why nailed down.
Hope that helps - as I say, it's as specific as I could offer without having an idea of the real data you're using.
(for the sake of reference, here is a fiddle with a working version of the above example query)

In your case where you only need one column from the table, make this a subquery in your select clause instead of than a join. You get the latest revision by ordering by revision number descending and limiting the result to one row.
SELECT
main_table.id,
(
select sub_table_1.field_1
from sub_table_1
where sub_table_1.id = main_table.id
order by revision_number desc
limit 1
) as sub_table_1_field_1,
main_table.field_1,
...
FROM main_table
INNER JOIN sub_table_4 ON sub_table_4.id = main_table.id
INNER JOIN sub_table_2 ON sub_table_2.id = main_table.id
INNER JOIN sub_table_3 ON sub_table_3.id = main_table.id
WHERE sub_table_4.field_1 = ''
AND sub_table_4.field_2 = '0'
AND sub_table_2.field_1 != '';

Related

Working with SELECT and SUB SELECT in MySQL

I have a question about a SQL, I have never worked with the select sub and I ended up getting lost with it.
Meu SQL:
SELECT CLI.id, CLI.nome, CLI.senha, CLI.email, CLI.cpf, CLI.celular, CLI.data_nasc, CLI.genero, CLI.data_cadastro, CLI.status, CLI.id_socket, ATEN.mensagem, ARQ.nome AS foto, ATEN.data_mensagem
FROM ut_clientes AS CLI
LEFT JOIN ut_arquivos AS ARQ ON (ARQ.id_tipo = CLI.id AND ARQ.tipo = "ut_clientes")
INNER JOIN ut_atendimentos AS ATEN ON (ATEN.id_usuario_envio = CLI.id)
WHERE ATEN.id_usuario_envio != 59163
GROUP BY CLI.id
ORDER BY ATEN.data_mensagem
DESC
Well, what I would like to do is group the messages according to the customer ID and bring only the last message recorded in the database according to the data_mensagem.
I have tried in many ways but always the last one that is displayed is the first message inserted in DB.
If anyone can help me, I'll be grateful. Thank you guys!
This may help you... I am using a join to a pre-query (PQ alias). This query just goes to your messages and grabs the client ID and the most recent based on the MAX(). By doing the group by here, it will at most return 1 record per client. I also have the WHERE clause to exclude the one ID you listed.
From THAT result, you do a simple join to the rest of your query.
SELECT
CLI.id,
CLI.nome,
CLI.senha,
CLI.email,
CLI.cpf,
CLI.celular,
CLI.data_nasc,
CLI.genero,
CLI.data_cadastro,
CLI.status,
CLI.id_socket,
ATEN.mensagem,
ARQ.nome AS foto,
PQ.data_mensagem
FROM
ut_clientes AS CLI
LEFT JOIN ut_arquivos AS ARQ
ON CLI.id = ARQ.id_tipo
AND ARQ.tipo = "ut_clientes"
INNER JOIN
( select
ATEN.id_usuario_envio,
MAX( ATEN.data_mensagem ) as MostRecentMsg
from
ut_atendimentos AS ATEN
where
ATEN.id_usuario_envio != 59163
group by
ATEN.id_usuario_envio ) PQ
ON CLI.id = PQ.id_usuario_envio
GROUP BY
CLI.id
ORDER BY
PQ.data_mensagem DESC

Grouping method

I am working on a query with the following format:
I require all the columns from the Database 'A', while I only require the summed amount (sum(amount)) from the Database 'B'.
SELECT A.*, sum(B.CURTRXAM) as 'Current Transaction Amt'
FROM A
LEFT JOIN C
ON A.Schedule_Number = C.Schedule_Number
LEFT JOIN B
ON A.DOCNUMBR = B.DOCNUMBR
ON A.CUSTNMBR = B.CUSTNMBR
GROUP BY A
ORDER BY A.CUSTNMBR
My question is regarding the grouping statement, database A has about 12 columns and to group by each individually is tedious, is there a cleaner way to do this such as:
GROUP BY A
I am not sure if a simpler way exists as I am new to SQL, I have previously investigated GROUPING_ID statements but thats about it.
Any help on lumped methods of grouping would be helpful
Since the docnumber is the primary key - just use the following SQL:
SELECT A.*, sum(B.CURTRXAM) as 'Current Transaction Amt'
FROM A
LEFT JOIN C
ON A.Schedule_Number = C.Schedule_Number
LEFT JOIN B
ON A.DOCNUMBR = B.DOCNUMBR
ORDER BY RM20401.CUSTNMBR
GROUP BY A.DOCNUMBR

MYSQL multiple INNER JOIN retrieve every information

i want to run the following query in my script but it won't work correctly.
I'm not getting any errors, it just selects the data from hs_data correct (and everything) but from hs_download_links it only retrieved as much data as there are entries in hs.images.
The goal of this query should be to get any entry from hs_data. If there's information related to these entries in hs_download_links or hs_images it should get them as well.
SELECT
h.hacks_ID, h.hacks_Name, h.hacks_Name_Full,
h.hacks_Version, h.hacks_Description, h.hacks_AddDate,
h.hacks_Type, SUM(dl.link_count) AS link_count, i.image_NameThumb
FROM
hs_data h
LEFT JOIN
(hs_download_links dl CROSS JOIN hs_images i)
ON
((dl.link_hackID = h.hacks_ID AND i.image_HackID = h.hacks_ID)
OR
(dl.link_hackID = h.hacks_ID AND i.image_HackID is NULL)
OR
(dl.link_hackID is NULL AND i.image_HackID = h.hacks_ID)
OR
(dl.link_hackID is NULL AND i.image_HackID is NULL) )
GROUP BY
h.hacks_ID
ORDER BY link_count DESC
Would be great if you guys could help me, much thanks :)
I think you can write this as:
SELECT h.hacks_ID, h.hacks_Name, h.hacks_Name_Full,
h.hacks_Version, h.hacks_Description, h.hacks_AddDate,
h.hacks_Type, SUM(dl.link_count) AS link_count, i.image_NameThumb
FROM hs_data h LEFT JOIN
hs_download_links dl
on dl.link_hackID = h.hacks_ID LEFT JOIN
hs_images i
on i.image_HackID = h.hacks_ID
GROUP BY h.hacks_ID
ORDER BY link_count DESC;
I'm not sure what you are trying to accomplish with the cross join, but it seems unnecessarily confusing. You can chain left outer joins together. They still keep all the rows in the first table.
By the way, your select has a dangling i.image_NameThumb. That means that only one such value will appear. If there are multiple values, you might want to use group_concat(). Or even min() or max() just to clarify the intent of the query.
This will return the number of download links and the first image, if any, for each record in hs_data:
SELECT d.*,
(
SELECT COUNT(*)
FROM hs_download_links dl
WHERE dl.link_hackId = d.hacks_Id
),
(
SELECT i.image_NameThumb
FROM hs_images i
WHERE i.image_hackId = d.hacks_Id
ORDER BY
image_NameThumb
LIMIT 1
)
FROM hs_data d

Replacing Subqueries with Joins in MySQL

I have the following query:
SELECT PKID, QuestionText, Type
FROM Questions
WHERE PKID IN (
SELECT FirstQuestion
FROM Batch
WHERE BatchNumber IN (
SELECT BatchNumber
FROM User
WHERE RandomString = '$key'
)
)
I've heard that sub-queries are inefficient and that joins are preferred. I can't find anything explaining how to convert a 3+ tier sub-query to join notation, however, and can't get my head around it.
Can anyone explain how to do it?
SELECT DISTINCT a.*
FROM Questions a
INNER JOIN Batch b
ON a.PKID = b.FirstQuestion
INNER JOIN User c
ON b.BatchNumber = c.BatchNumber
WHERE c.RandomString = '$key'
The reason why DISTINCT was specified is because there might be rows that matches to multiple rows on the other tables causing duplicate record on the result. But since you are only interested on records on table Questions, a DISTINCT keyword will suffice.
To further gain more knowledge about joins, kindly visit the link below:
Visual Representation of SQL Joins
Try :
SELECT q.PKID, q.QuestionText, q.Type
FROM Questions q
INNER JOIN Batch b ON q.PKID = b.FirstQuestion
INNER JOIN User u ON u.BatchNumber = q.BatchNumber
WHERE u.RandomString = '$key'
select
q.pkid,
q.questiontext,
q.type
from user u
join batch b
on u.batchnumber = b.batchnumber
join questions q
on b.firstquestion = q.pkid
where u.randomstring = '$key'
Since your WHERE clause filters on the USER table, start with that in the FROM clause. Next, apply your joins backwards.
In order to do this correctly, you need distinct in the subquery. Otherwise, you might multiply rows in the join version:
SELECT q.PKID, q.QuestionText, q.Type
FROM Questions q join
(select distinct FirstQuestion
from Batch b join user u
on b.batchnumber = u.batchnumber and
u.RandomString = '$key'
) fq
on q.pkid = fq.FirstQuestion
As to whether the in or join version is better . . . that depends. In some cases, particularly if the fields are indexed, the in version might be fine.

Why Does My MySQL Query Using a Subselect Hang?

The following query hangs: (although subqueries perfomed separately are fine)
I don't know how to make the explain table look ok. If someone tells me, I'll clean it up.
select
sum(grades.points)) as p,
from assignments
left join grades using (assignmentID)
where gradeID IN
(select grades.gradeID
from assignments
left join grades using (assignmentID)
where ... grades.date <= '1255503600' AND grades.date >= '984902400'
group by assignmentID order by grades.date DESC);
I think the problem is with the first grades table... the type ALL with that many rows seems to be the cause.. Everything is indexed.
I uploaded the table as an image. Couldn't get the formatting right:
http://imgur.com/AjX34.png
A commenter wanted the full where clause:
explain extended select count(assignments.assignmentID) as asscount, sum(TRIM(TRAILING '-' FROM grades.points)) as p, sum(assignments.points) as t
from assignments left join grades using (assignmentID)
where gradeID IN
(select grades.gradeID from assignments left join grades using (assignmentID) left join as_types on as_types.ID = assignments.type
where assignments.classID = '7815'
and (assignments.type = 30170 )
and grades.contactID = 7141
and grades.points REGEXP '^[-]?[0-9]+[-]?'
and grades.points != '-'
and grades.points != ''
and (grades.pointsposs IS NULL or grades.pointsposs = '')
and grades.date <= '1255503600'
AND grades.date >= '984902400'
group by assignmentID
order by grades.date DESC);
See "The unbearable slowness of IN":
http://www.artfulsoftware.com/infotree/queries.php#568
Super messy, but: (thanks for everyone's help)
SELECT *
FROM grades
LEFT JOIN assignments ON grades.assignmentID = assignments.assignmentID
RIGHT JOIN (
SELECT g.gradeID
FROM assignments a
LEFT JOIN grades g
USING ( assignmentID )
WHERE a.classID = '7815'
AND (
a.type =30170
)
AND g.contactID =7141
g.points
REGEXP '^[-]?[0-9]+[-]?'
AND g.points != '-'
AND g.points != ''
AND (
g.pointsposs IS NULL
OR g.pointsposs = ''
)
AND g.date <= '1255503600'
AND g.date >= '984902400'
GROUP BY assignmentID
ORDER BY g.date DESC
) AS t1 ON t1.gradeID = grades.gradeID
Suppose you use a Real Database (ie, any database except MySQL, but I'll use Postgres as an example) to do this query :
SELECT * FROM ta WHERE aid IN (SELECT subquery)
a Real Database would look at the subquery and estimate its rowcount :
If the rowcount is small (say, less than a few millions)
It would run the subquery, then build an in-memory hash of ids, which also makes them unique, which is a feature of IN().
Then, if the number of rows pulled from ta is a small part of ta, it would use a suitable index to pull the rows. Or, if a major part of the table is selected, it would just scan it entirely, and lookup each id in the hash, which is very fast.
If however the subquery rowcount is quite large
The database would probably rewrite it as a merge JOIN, adding a Sort+Unique to the subquery.
However, you are using MySQL. In this case, it will not do any of this (it is gonna re-execute the subquery for each row of your table) so it will take 1000 years. Sorry.
If your subquery performs fine when it is executed separately, then try using a JOIN rather than IN, like this:
select count(assignments.assignmentID) as asscount, sum(TRIM(TRAILING '-' FROM grades.points)) as p, sum(assignments.points) as t
from assignments left join grades using (assignmentID)
join
(select grades.gradeID from assignments left join grades using (assignmentID) left join as_types on as_types.ID = assignments.type
where assignments.classID = '7815'
and (assignments.type = 30170 )
and grades.contactID = 7141
and grades.points REGEXP '^[-]?[0-9]+[-]?'
and grades.points != '-'
and grades.points != ''
and (grades.pointsposs IS NULL or grades.pointsposs = '')
and grades.date <= '1255503600'
AND grades.date >= '984902400'
group by assignmentID
order by grades.date DESC) using (gradeID);
There really isn't enough information to answer your question, and you've put a ... in the middle of the where clause which is weird. How big are the tables involved and what are the indexes?
Having said that, if there are too many terms in an in clause, you can see seriously degraded performance. Replace the use of in with a right join.
For starters, the table as_types in the in clause is not used. Left joining it serves no purpose so get rid of it.
That leaves the in clause having only the assignments and grades table from the outer query. Clearly the wheres the modify assignments belong in the where clause for the outer query. You should move all of the where grades=whatever into the on clause of the left join to grades.
The query is a little tough to follow, but I suspect that the subquery isn't necessary at all.
It seems like your query is basically thus:
SELECT FOO()
FROM assignments LEFT JOIN grades USING (assignmentID)
WHERE gradeID IN
(
SELECT grades.gradeID
FROM assignments LEFT JOIN grades USING (assignmentID)
WHERE your_conditions = TRUE
);
But, you're not doing anything really fancy in the where clause in the subquery.
I suspect something more like
SELECT FOO()
FROM assignments LEFT JOIN grades USING (assignmentID)
GROUP BY groupings
WHERE your_conditions_with_some_tweaks = TRUE;
would work just as well.
If I'm missing some key logic here please comment back and I'll edit/delete this post.