MySQL Query returns duplicated results

MySQL Query returns duplicated results - mysql

I am fairly new at MySQL and I am working with a database system which has four main tables, described here:
http://www.pastie.org/3832181
The table that this query primarily works with is here:
http://www.pastie.org/3832184
Seems fairly simple right?
My query's purpose is to grab all the BusinessID's for a explicit User where the OpportunityID's are NULL, once it has those BusinessID's, I want it to find the associated BusinessName in the Business table and match that BusinessName with the BusinessName(Business) in the EmploymentOpportunity table.
This is my query to perform that action.
SELECT EmploymentOpportunity.OpportunityID, Business, Description
FROM UserBusinessOpportunity, Business, EmploymentOpportunity
WHERE UserBusinessOpportunity.BusinessID =
(SELECT UserBusinessOpportunity.BusinessID
FROM UserBusinessOpportunity
WHERE UserBusinessOpportunity.UserID=1 AND
UserBusinessOpportunity.OpportunityID is NULL)
AND UserBusinessOpportunity.BusinessID = Business.BusinessID
AND Business.BusinessName = EmploymentOpportunity.Business;
The sub-select statement is supposed to return the subset of BusinessID's. I'm sure this is an extremely simple query, but it keeps giving me duplicate results and I'm sure why. The set of results should be 3, but it's sending me 24, or 8 repeating sets of those 3.
Thanks, if you can help me figure this out.

Use distinct keyword to remove duplicates.
Have a look here

You might be missing a constraint somewhere when you are doing the join in your SQL statement. From what I got out of your description, it sounds like this query might work for you.
SELECT
User.UserID,
Business.BusinessID,
Business.BusinessName
FROM
User,
UserBusinessOpportunity,
Business
WHERE
User.UserID = 1 AND
User.UserID = UserBusinessOpportunity.UserID AND
UserBusinessOpportunity.OpportunityID IS NULL AND
UserBusinessOpportunity.BusinessID = Business.BusinessID

I think you intend an inner join but your query is an implicit outer join.
Joining with null fields often works best with a left join.
Try this:
SELECT EmploymentOpportunity.OpportunityID, Business, Description
FROM EmploymentOpportunity
JOIN Business ON Business.BusinessName = EmploymentOpportunity.Business
LEFT JOIN UserBusinessOpportunity USING(BusinessID)
WHERE
UserBusinessOpportunity.UserID=1
AND
UserBusinessOpportunity.OpportunityID is NULL
Julian

Related

Join to tables and String Compare (large data set)

I am very new to SQL and don't really know much about what i'm doing. I'm trying to figure out how to get a list of leads and owners whose corresponding campaign record types are stated as "inter"
So far I have tried joining the two tables and running a string compare I found on a different stack overflow page. Separately they work fine but together everything breaks... I only get the error "You have an error in your SQL syntax; check the manual"
select a.LeadId, b.OwnerId from
(select * from CampaignMember as a
join
select * from Campaign as b
on b.id = a.CampaignId)
where b.RecordTypeId like "inter%"
Schema:
Campaign CampaignMember
------------- ----------------
Id CampaignId
OwnerId LeadId
RecordTypeId ContactId
The string compare is also very slow. I am looking at a table of 600M values. Is there a faster alternative?
Is there also a way to get more specific errors in MySQL?

If you format your code properly, it will be very easy to see why it's not working.
select a.LeadId, b.OwnerId
from (
select *
from CampaignMember as a
join select *
from Campaign as b on b.id = a.CampaignId
)
where b.RecordTypeId like "inter%"
It's not a valid JOIN format. Also the last part, SQL use single quote ' instead of double quote "
Probably what you want is something like this
SELECT a.LeadId, b.OwnwerId
FROM CampaignMember a
JOIN Campaign b ON b.id = a.CampaignId
WHERE b.RecordTypeId LIKE 'inter%'

Try this:
select CampaignMember.LeadId, Campaign.OwnerId from
Campaign
inner join
CampaignMember
on CampaignMember.CampaignId= Campaign.id
where Campaign.RecordTypeId like "inter%"

MySql is generally pretty poor and handling sub-selects, so you should avoid them when possible. Also, your sub-select isn't filtering any rows, so it has to evaluate every row before applying the LIKE filter. This is sometimes "intelligently" handled by the query engine, but you should try to minimize reliance on the engine to optimize the query.
Additionally, you really should only return the columns that you care about; SELECT * is ok for confirming things, but slows queries down.
Therefore, the query posted by Eric (above) is actually the best choice.

Finding which of an array of IDs has no record with a single query

I'm generating prepared statements with PHP PDO to pull in information from two tables based on an array of IDs.
Then I realized that if an ID passed had no record I wouldn't know.
I'm locating records with
SELECT
r.`DEANumber`,
TRIM(r.`ActivityCode`) AS ActivityCode,
TRIM(r.`ActivitySubCode`) as ActivitySubCode,
// other fields...
a.Activity
FROM
`registrants` r,
`activities` a
WHERE r.`DEAnumber` IN ( ?,?,?,?,?,?,?,? )
AND a.Code = ActivityCode
AND a.Subcode = ActivitySubCode
But I am having trouble figuring out the negative join that says which of the IDs has no record.
If two tables were involved I think I could do it like this
SELECT
r.DEAnumber
FROM registrant r
LEFT JOIN registrant2 r2 ON r.DEAnumber = r2.DEAnumber
WHERE r2.DEAnumber IS NULL
But I'm stumped as to how to use the array of IDs here. Obviously I could iterate over the array and track which queries had not result but it seems like such a manual and wasteful way to go...

Obviously I could iterate over the array and track which queries had not result but it seems like such a manual and wasteful way to go.
What could be a real waste is spending time solving this non-existent "problem".
Yes, you could iterate. Either manually, or using a syntax sugar like array_diff() in PHP.
I suggest that instead of making your query more complex (means heavier to support) for little gain, you just move on.
As old man Knuth once said 'premature optimization is the root of all evil'.
The only thing I could think of a help from PDO is a fetch mode that will put IDs as keys for the returned array, and thus you'll be able to make it without [explicitly written] loop, like
$stmt->execute($ids);
$data = $stmt->fetchAll(PDO::FETCH_UNIQUE);
$notFound = array_diff($ids, array_keys($data));
Yet a manual loop would have taken only two extra lines, which is, honestly, not that a big deal to talk about.

You are on the right track - a left join that filters out matches will give you the missing joins. You just need to move all conditions on the left-joined table up into the join.
If you leave the conditions on the joined table in the where clause you effectively cause an inner join, because the where clause is executed on the rows after the join is made, which is too late if there was no join in the first place.
Change the query to use proper join syntax, specifying a left join, with the conditions on activity moved to the join'n on clause:
SELECT
r.DEANumber,
TRIM(r.ActivityCode) AS ActivityCode,
TRIM(r.ActivitySubCode) as ActivitySubCode,
// other fields...
a.Activity
FROM registrants r
LEFT JOIN activities a ON a.Code = ActivityCode
AND a.Subcode = ActivitySubCode
WHERE r.DEAnumber IN (?,?,?,?,?,?,?,?)
In your app code, if Activity is null then you know there was no activity for that id.
This won't affect performance much, other than to return (potentially) more rows.
To just select all registrants without activities:
select r.DEAnumber
from registrants r
left join activities a on a.Code = ActivityCode
and a.Subcode = ActivitySubCode
where r.`DEAnumber` IN ( ?,?,?,?,?,?,?,? )
and a.Code is null

Complex MySQL query problems and also SQL hangs

I am trying to write an SQL query which is pretty complex. The requirements are as follows:
I need to return these fields from the query:
track.artist
track.title
track.seconds
track.track_id
track.relative_file
album.image_file
album.album
album.album_id
track.track_number
I can select a random track with the following query:
select
track.artist, track.title, track.seconds, track.track_id,
track.relative_file, album.image_file, album.album,
album.album_id, track.track_number
FROM
track, album
WHERE
album.album_id = track.album_id
ORDER BY RAND() limit 10;
Here is where I am having trouble though. I also have a table called "trackfilters1" thru "trackfilters10" Each row has an auto incrementing ID field. Therefore, row 10 is data for album_id 10. These fields are populated with 1's and 0's. For example, album #10 has 10 tracks, then trackfilters1.flags will contain "1111111111" if all tracks are to be included in the search. If track 10 was to be excluded, then it would contain "1111111110"
My problem is including this clause.
The latest query I have come up with is the following:
select
track.artist, track.title, track.seconds,
track.track_id, track.relative_file, album.image_file,
album.album, album.album_id, track.track_number
FROM
track, album, trackfilters1, trackfilters2
WHERE
album.album_id = track.album_id
AND
( (album.album_id = trackfilters1.id)
OR
(album.album_id=trackfilters2.id) )
AND
( (mid(trackfilters1.flags, track.track_number,1) = 1)
OR
( mid(trackfilters2.flags, track.track_number,1) = 1))
ORDER BY RAND() limit 2;
however this is causing SQL to hang. I'm presuming that I'm doing something wrong. Does anybody know what it is? I would be open to suggestions if there is an easier way to achieve my end result, I am not set on repairing my broken query if there is a better way to accomplish this.
Additionally, in my trials, I have noticed when I had a working query and added say, trackfilters2 to the FROM clause without using it anywhere in the query, it would hang as well. This makes me wonder. Is this correct behavior? I would think adding to the FROM list without making use of the data would just make the server procure more data, I wouldn't have expected it to hang.

There's not enough information here to determine what's causing the performance issue.
But here's a few suggestions and comments.
Ditch the old-school comma syntax for the join operations, and use the JOIN keyword instead. And relocate the join predicates to an ON clause.
And for heaven's sake, format the SQL so that it's decipherable by someone trying to read it.
There's some questions here... will there always be a matching row in both trackfilters1 and trackfilters2 for rows you want to return? Or could a row be missing from trackfilters2, and you still want to return the row if there's a matching row in trackfilters1? (The answer to that question determines whether you'd want to use an outer join vs an inner join to those tables.)
For best performance with large sets, having appropriate indexes defined is going to be critical.
Use EXPLAIN to see the execution plan.
I suggest you try writing your query like this:
SELECT track.artist
, track.title
, track.seconds
, track.track_id
, track.relative_file
, album.image_file
, album.album
, album.album_id
, track.track_number
FROM track
JOIN album
ON album.album_id = track.album_id
LEFT
JOIN trackfilters1
ON trackfilters1.id = album.album_id
LEFT
JOIN trackfilters2
ON trackfilters2.id = album.album_id
WHERE MID(trackfilters1.flags, track.track_number, 1) = '1'
OR MID(trackfilters2.flags, track.track_number, 1) = '1'
ORDER BY RAND()
LIMIT 2
And if you want help with performance, provide the output from EXPLAIN, and what indexes are defined.

SQL query to select based on many-to-many relationship

This is really a two-part question, but in order not to mix things up, I'll divide into two actual questions. This one is about creating the correct SQL statement for selecting a row based on values in a many-to-many related table:
Now, the question is: what is the absolute simplest way of getting all resources where e.g metadata.category = subject AND where that category's corresponding metadata.value ='introduction'?
I'm sure this could be done in a lot of different ways, but I'm a novice in SQL, so please provide the simplest way possible... (If you could describe briefly what the statement means in plain English that would be great too. I have looked at introductions to SQL, but none of those I have found (for beginners) go into these many-to-many selections.)

The easiest way is to use the EXISTS clause. I'm more familiar with MSSQL but this should be close
SELECT *
FROM resources r
WHERE EXISTS (
SELECT *
FROM metadata_resources mr
INNER JOIN metadata m ON (mr.metadata_id = m.id)
WHERE mr.resource_id = r.id AND m.category = 'subject' AND m.value = 'introduction'
)
Translated into english it's 'return me all records where this subquery returns one or more rows, without returning the data for those rows'. This sub query is correlated to the outer query by the predicate mr.resource_id = r.id which uses the outer row as the predicate value.
I'm sure you can google around for more examples of the EXIST statement

Inexplicable inner query

I have a nested SQL query that is exhibiting results which I can't understand. The query joins the PARTNER and USER tables via the PARTNER_USER table. A partner is basically a collection of users, and the objective of this query is to figure out when the 20th user registered with the partner that has ID 34:
select p.partner_id id,
u.created_on launch_date
from user u join partner_user pu
using (user_id) join partner p
using (partner_id)
where p.partner_id = 34
and u.user_id =
(select nu.user_id
from user nu
join partner_user npu using (user_id)
join partner np using (partner_id)
where np.partner_id = 34
order by nu.created_on limit 19, 1)
However, if I change the 2nd last line to
where np.partner_id = p.partner_id
The query fails with the error message "Subquery returns more than 1 row".
Why does the first query work, but not the second? They look equivalent to me.
Thanks,
Don

JPunyon is right. One or the other query has to run first, and then have its results trimmed after the fact.
If you look at the queries as written, the outer query has to know the result of the inner query to apply its where clause. However, when you specify
where np.partner_id = p.partner_id
in the inner query, then you're trying to make the inner query know the result of the outer query to apply its where clause as well. That's a circular dependency.
As a human, you can read the query and you can tell that in this particular case, you're asking for one particular value in the where clause in the outer query and you're asking to use that same value in the inner query, so it seems as though the database should see that and use the same literal value from the outer query.
In reality, the inner query is simply run first without knowing the possible values of p.partner_id, hence the "multiple rows" error.

Whe you use the = operator to compare with the results of a subquery, your subquery may return only a single row.
if you want to check for all rows that are returned by the subquery, you have to use the IN operator.
AND u.User_Id IN ( SELECT .... )

When you change the where clause in the sub query you're letting the floodgates open. The where clause in the main query doesn't restrict the subquery. So you're getting more than 1 result.
EDIT: What database is this? I've not come across the "using" construct before...

#Jason Punyon
mysql supports the USING construct.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL Query returns duplicated results - mysql

Use distinct keyword to remove duplicates. Have a look here

Related

Join to tables and String Compare (large data set)

Finding which of an array of IDs has no record with a single query

Complex MySQL query problems and also SQL hangs

SQL query to select based on many-to-many relationship

Inexplicable inner query

Categories

Resources