I have been looking for a solution for this in SQL. I am trying to find records from one table that has the same first two characters and same birth date. I thought about doing self-join but I doubt I am getting the right results. Here is my query, please tell me what's missing:
select p1.frst_name,
from person p1 inner join person p2
on upper(left(p1.frst_name,2)) like upper(left(p2.frst_name,2))
and upper(p1.last_name) LIKE upper(p2.last_name)
and p1.birth_date = p2.birth_date
Join on the last_name and birth_date since you want those to match exactly, then filter by the two first two characters matching.
You shouldn't need upper() on p1.frst_name or p2.frst_name. Because they are the same column in the same table, their cases will match.
Try...
select p1.frst_name,
from person p1
full outer join person p2
on p1.last_name = p2.last_name
and p1.birth_date = p2.birth_date
where upper(left(p1.frst_name,2)) like upper(left(p2.frst_name,2))
Change LIKE to = (you want an exact match), and add a join condition to prevent rows from joining to themselves:
select p1.id, p1.frst_name, p1.last_name, p1.birth_date
from person p1
join person p2
on upper(left(p1.frst_name,2)) = upper(left(p2.frst_name,2))
and upper(p1.last_name) = upper(p2.last_name)
and p1.birth_date = p2.birth_date
and p1.id != p2.id
Without the addition of and p1.id != p2.id, every row would be returned, because of course every row would otherwise match itself.
The question was tagged with both mysql and oracle. The above query works in mysql. For iracle, which doesn't support left(col, 2), use substr(col, 1, 2) instead.
Related
I have the next results from a query. I did this:
Where the user "Adriana Smith" with ID 6 is repeated because she has different contract dates, to do that I did a left join from table bo_users to bo_users_contracts (1:m One to Many Relation). The query is below:
SELECT bo_users.ID, bo_users.display_name, COALESCE (bo_users_contracts.contract_start_date,'-') AS contract_start_date, COALESCE (bo_users_contracts.contract_end_date, '-') AS contract_end_date, COALESCE (bo_users_contracts.current,'-') AS current
FROM bo_users
LEFT JOIN bo_users_contracts ON bo_users.ID = bo_users_contracts.bo_users_id
LEFT JOIN bo_usermeta ON bo_users.ID = bo_usermeta.user_id
WHERE (bo_usermeta.meta_key = 'role' AND bo_usermeta.meta_value = 'member')
But I want to get all users, but from user Adriana I just want to get the occurrence where "current" column = 1.
So the final result would be the 3 user's records:
Alejandro, Rhonda and Adriana (with "current" = 1)
Thank you!
Since you want to limit on a table being outer joined, the limit should be placed on the join itself so the all records from bo_users is retained. (as indicated desired by the outer join)
Essentially the limit is applied before the join so the unmatched records from BO_users to bo_users_contracts are kept. If applied after the join in a where clause the records from BO_user without a matching record would have a null value for current and thus be excluded when the current=1 filter is applied.
In this example the only values which should be in the where would be from table BO_USERS.
I'd even move the bo_usermeta filters to the join or you may lose bo_users; or the left join on the 3rd table should be an inner join.
SELECT bo_users.ID
, bo_users.display_name
, COALESCE (bo_users_contracts.contract_start_date,'-') AS contract_start_date
, COALESCE (bo_users_contracts.contract_end_date, '-') AS contract_end_date
, COALESCE (bo_users_contracts.current,'-') AS current
FROM bo_users
LEFT JOIN bo_users_contracts
ON bo_users.ID = bo_users_contracts.bo_users_id
and bo_users_contracts.current = 1
LEFT JOIN bo_usermeta --This is suspect
ON bo_users.ID = bo_usermeta.user_id
WHERE (bo_usermeta.meta_key = 'role' --this is suspect
AND bo_usermeta.meta_value = 'member') --this is suspect
The lines reading this is suspect are that way because you have a left join which means you want all users from bo_users.. However if a user doesn't have a meta_key or meta_value defined, they would be eliminated. Either change the join to an inner join or move the where clause limits to the join. I indicate this as you're query is "inconsistent" in it's definition leading to ambiguity when later maintained.
I'm really struggling with this query and I hope somebody can help.
I am querying across multiple tables to get the dataset that I require. The following query is an anonymised version:
SELECT main_table.id,
sub_table_1.field_1,
main_table.field_1,
main_table.field_2,
main_table.field_3,
main_table.field_4,
main_table.field_5,
main_table.field_6,
main_table.field_7,
sub_table_2.field_1,
sub_table_2.field_2,
sub_table_2.field_3,
sub_table_3.field_1,
sub_table_4.field_1,
sub_table_4.field_2
FROM main_table
INNER JOIN sub_table_4 ON sub_table_4.id = main_table.id
INNER JOIN sub_table_2 ON sub_table_2.id = main_table.id
INNER JOIN sub_table_3 ON sub_table_3.id = main_table.id
INNER JOIN sub_table_1 ON sub_table_1.id = main_table.id
WHERE sub_table_4.field_1 = '' AND sub_table_4.field_2 = '0' AND sub_table_2.field_1 != ''
The query works, the problem I have is sub_table_1 has a revision number (int 11). Currently I get duplicate records with different revision numbers and different versions of sub_table_1.field_1 which is to be expected, but I want to limit the result set to only include results limited by the latest revision number, giving me only the latest sub_table_1_field_1 and I really can not figure it out!
Can anybody lend me a hand?
Many Thanks.
It's always important to remember that a JOIN can be on a subquery as well as a table. You could build a subquery that returns the results you want to see then, once you've got the data you want, join it in the parent query.
It's hard to 'tailor' an answer that's specific to you problem, as it's too obfuscated (as you admit) to know what the data and tables really look like, but as an example:
Say table1 has four fields: id, revision_no, name and stuff. You want to return a distinct list of name values, with their latest version of stuff (which, we'll pretend varies by revision). You could do this in isolation as:
select t.* from table1 t
inner join
(SELECT name, max(revision_no) maxr
FROM table1
GROUP BY name) mx
on mx.name = t.name
and mx.maxr = t.revision_no;
(Note: see fiddle at the end)
That would return each individual name with the latest revision of stuff.
Once you've got that nailed down, you could then swap out
INNER JOIN sub_table_1 ON sub_table_1.id = main_table.id
....with....
INNER JOIN (select t.* from table1 t
inner join
(SELECT name, max(revision_no) maxr
FROM table1
GROUP BY name) mx
on mx.name = t.name
and mx.maxr = t.revision_no) sub_table_1
ON sub_table_1.id = main_table.id
...which would allow a join with a recordset that is more tailored to that which you want to join (again, don't get hung up on the actual query I've used, it's just there to demonstrate the method).
There may well be more elegant ways to achieve this, but it's sometimes good to start with a simple solution that's easier to replicate, then simplify it once you've got the general understanding of the what and why nailed down.
Hope that helps - as I say, it's as specific as I could offer without having an idea of the real data you're using.
(for the sake of reference, here is a fiddle with a working version of the above example query)
In your case where you only need one column from the table, make this a subquery in your select clause instead of than a join. You get the latest revision by ordering by revision number descending and limiting the result to one row.
SELECT
main_table.id,
(
select sub_table_1.field_1
from sub_table_1
where sub_table_1.id = main_table.id
order by revision_number desc
limit 1
) as sub_table_1_field_1,
main_table.field_1,
...
FROM main_table
INNER JOIN sub_table_4 ON sub_table_4.id = main_table.id
INNER JOIN sub_table_2 ON sub_table_2.id = main_table.id
INNER JOIN sub_table_3 ON sub_table_3.id = main_table.id
WHERE sub_table_4.field_1 = ''
AND sub_table_4.field_2 = '0'
AND sub_table_2.field_1 != '';
By not overlapping matches, I mean that each row should be returned at most once. That seems to be the hard part.
I managed to get the best not overlapping matches, using the following query where pair is a view that has all possible matches as (id1,id2,val1,val2) rows.
SELECT p.* FROM pair p
LEFT JOIN pair p1 ON p1.id1 = p.id1 AND p1.val2 < p.val2
LEFT JOIN pair p2 ON p2.id2 = p.id2 AND p2.val1 < p.val1
WHERE
p1.id1 IS NULL
AND p2.id2 IS NULL;
Full sqlfiddle here http://sqlfiddle.com/#!9/68614/2
For values a,b in t1 and a,d in t2 I want it to return pairs (a,a) and (b,d) but it only returns (a,a)
Could someone provide a working solution? Or if this kind of matching would fundamentally be better done on the client, could you explain why?
-- EDIT
The problem I'm trying to solve is similar to the one discussed here: Retrieving the last record in each group
My requirements are higher, I need in addition that matches don't overlap.
Are you looking for something like that?
SELECT p.* FROM pair p
LEFT JOIN pair p1 ON p1.id1 = p.id1 AND p1.val2 < p.val2
LEFT JOIN pair p2 ON p2.id2 = p.id2 AND p2.val1 < p.val1
WHERE
p.id1 = p.id2
ORDER BY id1
SQL FIDDLE
This is my query. The output looks fine except the COUNT function is returning numbers which seem totally arbitrary (e.g. 7-digit numbers where I'd expect 3-digit numbers):
SELECT tc.tableName, m.fieldName, COUNT(m.fieldName)
FROM apiResult, (
SELECT cc.surveyID, cc.fieldName
FROM apiResult as ar
INNER JOIN columnConversion as cc
ON substring(ar.triggerName,-10)=cc.fieldID
) AS m
INNER JOIN tableConversion as tc
ON m.surveyID=tc.surveyID
GROUP BY tc.tableName, m.fieldName;
I think, for a start, that COUNT(m.fieldName) is probably wrong, since it doesn't correspond with GROUP BY tc.tableName, m.fieldName.
Here's what the query is meant to do: one of the tables in the sub-query, apiResult, has a column called 'triggerName' which contains an ID I call 'fieldID', plus a column called 'surveyID'. The tables columnConversion and tableConversion are tables which match the IDs to human readble names. So, the follow query produces the count that I want, but, I want the IDs replaced by the human readable names, hence the above query:
SELECT cc.surveyID, cc.fieldName, COUNT(ar.triggerName)
FROM apiResult as ar
INNER JOIN columnConversion as cc
ON substring(ar.triggerName,-10)=cc.fieldID
GROUP BY (ar.triggerName)
Any ideas what I've done wrong?
Why are you mixing explicit and implicit joins? You appear to have missed a join condition on the first table. Well, actually, I don't think it is needed. This should work:
SELECT tc.tableName, m.fieldName, COUNT(m.fieldName)
FROM (SELECT cc.surveyID, cc.fieldName
FROM apiResult ar INNER JOIN
columnConversion cc
ON substring(ar.triggerName, -10) = cc.fieldID
) m INNER JOIN
tableConversion as tc
ON m.surveyID = tc.surveyID
GROUP BY tc.tableName, m.fieldName;
I have this query:
SELECT p.text,se.name,s.sub_name,SUM((p.volume / (SELECT SUM(p.volume)
FROM phrase p
WHERE p.volume IS NOT NULL) * sp.position))
AS `index`
FROM phrase p
LEFT JOIN `position` sp ON sp.phrase_id = p.id
LEFT JOIN `engines` se ON se.id = sp.engine_id
LEFT JOIN item s ON s.id = sp.site_id
WHERE p.volume IS NOT NULL
AND s.ignored = 0
GROUP BY se.name,s.sub_name
ORDER BY se.name,s.sub_name
There are a few things I want to do with it:
1) The end of the calculation for 'index', I multiple it all by sp.position, then get it's SUM. If there is NO MATCH in the first LEFT JOIN 'position', I want to give sp.position a value of 200. So basically if in the 'phrase' table I have an ID=2, but that does not exist in sp.phrase_id in the entire 'position' table, then sp.position=200 for the 'index' calculation, otherwise it will it will be whatever value is stored in the 'position' table. I hope that makes sense.
2) I do a GROUP BY se.name. I would like to actually SUM the entire 'index' values for similar se.name fields. So in the resultset as it stands now, if there were 20 p.text rows with the same se.name, I would like to SUM the index column for the same se.name(s).
I am more of a PHP guy, but trying to learn more MySQL. I have become a big believer in making the DB do as much of the work as possible instead of trying to manipulate the dataset after it's been returned.
I hope the questions were clear. Anyways, can both 1) and 2) be done? There's much more I want to modify this query to do, but I think if I need more help in the future on it, it would require a different question.
The position table has a engines_id, phrase_id, item_id which will make it a unique entry. The value I am trying to calculate is the sp.position value. But there are cases when there is no entry for these IDs combined. If there is no entry for the combo of 3 IDs I just listed, I would like to use sp.position=200 in my calculation.
How's this:
select x.name, sum(index) from
(
SELECT p.text,se.name,s.sub_name,SUM((p.volume / (SELECT SUM(p.volume)
FROM phrase p
WHERE p.volume IS NOT NULL) * if(sp.position is null,200,sp.position)))
AS `index`
FROM phrase p
LEFT JOIN `position` sp ON sp.phrase_id = p.id
LEFT JOIN `engines` se ON se.id = sp.engine_id
LEFT JOIN item s ON s.id = sp.site_id
WHERE p.volume IS NOT NULL
AND s.ignored = 0
GROUP BY se.name,s.sub_name
ORDER BY se.name,s.sub_name
)x
GROUP BY x.name
Try the following:
1.) Use IFNULL(), in your case IFNULL(sp.position, 200)
2.) I am not entirely clear on this part, but it seems like you already have part of what you are asking.