MYSQL: How to use outer aliases in subquery properly? - mysql

The first query works just fine. It returns one row from the table 'routepoint'. It has a certain 'route_id' and 'geo_distance()' is on its minimum given the parameters. I know that the subquery in the FROM section seems unnecessarily complicated but in my eyes it helps to highlight the problem with the second query.
The differences are in the last two rows.
SELECT rp.*
FROM routepoint rp, route r, (SELECT * FROM ride_offer WHERE id = 6) as ro
WHERE rp.route_id = r.id
AND r.id = ro.current_route_id
AND geo_distance(rp.lat,rp.lng,52372070,9735690) =
(SELECT MIN(geo_distance(lat,lng,52372070,9735690))
FROM routepoint rp1, ride_offer ro1
WHERE rp1.route_id = ro1.current_route_id AND ro1.id = 6);
The next query does not work at all. It completely freezes mysql and I have to restart.
What am I doing wrong? The first subquery returns excactly one row. I don't understand the difference.
SELECT rp.*
FROM routepoint rp, route r, (SELECT * FROM ride_offer WHERE id = 6) as ro
WHERE
rp.route_id = r.id
AND r.id = ro.current_route_id
AND geo_distance(rp.lat,rp.lng,52372070,9735690) =
(SELECT MIN(geo_distance(lat,lng,52372070,9735690))
FROM routepoint rp1
WHERE rp1.route_id = ro.current_route_id);

The problem is, as pointed out by Romain, that this is costly.
This article describes an algorithm that reduces the cost by a 2-step process.
Step 1: Find a bounding box that contains at least one point.
Step 2: Find the closest point by examining all points in the bounding box, which should be a comparatively small number, thus not so costly.

Related

Subquery returns more than one row error when one row is returned

I am currently doing some SQL magic and wanted to update the stock in my companies ERP program. However if I try to run the following query I get the error mentioned in the title.
update llx_product lp
set stock = (select sum(ps.reel)
from llx_product_stock as ps, llx_entrepot as w
where w.entity IN (1)
and w.rowid = ps.fk_entrepot
and ps.fk_product = lp.rowid
group by ps.rowid)
The subquery by itself returns just one row if used with a rowid for the product.
select sum(ps.reel)
from llx_product_stock as ps, llx_entrepot as w
where w.entity in (1)
and w.rowid = ps.fk_entrepot
and ps.fk_product = 7372
group by ps.rowid
Any help would be appreciated
I would suggest writing the query as:
update llx_product lp
set stock = (select sum(ps.reel)
from llx_product_stock ps join
llx_entrepot w
on ps.fk_product = lp.rowid
where w.entity in (1) and
w.rowid = ps.fk_entrepot
);
An aggregation query with no group by cannot return more than one row. It is unclear how your version is returning more than one row because the key used in the group by also has an equality comparison. Perhaps there is some type conversion issue at play.
But in any case, without the group by, you cannot get the error you are currently getting.
For anyone wondering the solution to my issue was a very simple query, Gordon pointed in the right direction and I made it harder than it should be.
update llx_product lp
left join llx_product_stock ps on lp.rowid = ps.fk_product
set lp.stock = ps.reel

getting different values, by condition becoming more specific

I have two pieces of code
SELECT * FROM etel.ti18n_country
inner join etel.ti18n
ON id_i18nid = i18nid WHERE id_countryid = 1
and
SELECT * FROM etel.ti18n_country
inner join etel.ti18n
ON id_i18nid = i18nid WHERE id_countryid = 1 and id_i18nid = 4460;
the first results in a bunch of results, but noticably none with id_i18nid = 4460
the second, however gets the result with id_i18nid = 4460.
how can that be? As I understand mysql the first piece of code should've had a result id_i18nid = 4460 for it to be possible for the second piece to have it aswell. Since I made the where clause more specific
Turns out the problem was that I was using Datagrips' ordering to find my id. since I had more than 500 results, datagrip takes random results and orders those. by ending the statement with ORDER BY id_i18nid DESC I found the result.

MySQL: Remove JOIN for Matched Row if 2nd Round of Criteria Not Met

CONDENSED VERSION
I'm trying to join a new list with my existing database with no unique identifier -- but I'm trying to figure out a way to do it in one query that's more specific than matching by first name/last name but less specific than by all the fields available (first name/middle name/last name/address/phone).
So my idea was to match solely on first/last name and then try to assign each possible matching field with points to see if anyone who matched had 'zero points' and thus have the first name/last name match stripped from them. Here's what I came up with:
SELECT *,
#MidMatch := IF(LEFT(l.middle,1)=LEFT(d.middle,1),"TRUE","FALSE") MidMatch,
#AddressMatch := IF(left(l.address,5)=left(d.address,5),"TRUE","FALSE") AddressMatch,
#PhoneMatch := IF(right(l.phone,4)=right(d.phone,4),"TRUE","FALSE") PhoneMatch,
#Points := IF(#MidMatch = "TRUE",4,0) + IF(#AddressMatch = "TRUE",3,0) + IF(#PhoneMatch = "TRUE",1,0) Points
FROM list l
LEFT JOIN database d on IF(#Points <> 0,(l.first = d.first AND l.last = d.last),(l.first = d.first AND l.last = d.last AND l.address = d.vaddress));
The query runs fine but it does still match people who's first/last names are identical even if their points are zero (and if their addresses don't match).
Is there a way to do what I'm looking for with this roundabout points system? I've found that it helps me a lot when trying to identify which duplicate to choose, so I'm trying to expand it to the initial match. Or should I do something different?
SPECIFIC VERSION
This is kind of a roundabout idea -- so if somebody has something more straight forward, I'd definitely be willing to bail on this completely and try something else. But basically I have a 93k person table (from a database) that I'm matching against a 92k person table (from a new list). I expect many of them to be the same but certainly not all -- and I'm trying to avoid creating duplicates. Unfortunately, there's no unique identifiers that can be matched, so I'm generally stuck with matching based on some variation of first name, middle name, last name, address, and/or phone number.
The schema for the two tables (list and database) are pretty identical with the fields you see above (first name, middle name, last name, address, phone) -- the only difference is that the database table also has an unique numerical ID that I would use to upload back into the database after this match. Unfortunately the list table has no such ID. Records with the ID would get matched and loaded in on top of the old record and any record without that ID would get loaded as a new record.
What I'm trying to avoid with this question is creating a bunch of different tables and queries that start with a really specific JOIN statement and then eventually get down to just first and last name -- since there's likely some folks who should match but have moved and/or gotten a new phone number since this last list.
I could write a very simple query as a JOIN and do it numerous times, each time taking out another qualifier:
SELECT *
FROM list l
JOIN database d
ON d.first = l.first AND d.last = l.last AND d.middle = l.middle AND d.address = l.address AND d.phone = l.phone;
And I'd certainly feel confident that those people from the new list matched with the existing people in my database, but it'd only return a very small amount of people, then I'd have to go back and loosen the criteria (e.g. drop the middle name restriction, etc.) and continually create tables then merge them all back together at the end along with all the ones that didn't match at all, which I would assume would be the new people.
But is there a way to write the query solely using a first/last name match, then evaluating the other criteria and wiping the match from people who have zero 'points' (below)? Here's what I attempted to do assigning [arbitrary] points to each match:
SELECT *,
#MidMatch := IF(LEFT(l.middle,1)=LEFT(d.middle,1),"TRUE","FALSE") MidMatch,
#AddressMatch := IF(left(l.address,5)=left(d.address,5),"TRUE","FALSE") AddressMatch,
#PhoneMatch := IF(right(l.phone,4)=right(d.phone,4),"TRUE","FALSE") PhoneMatch,
#Points := IF(#MidMatch = "TRUE",4,0) + IF(#AddressMatch = "TRUE",3,0) + IF(#PhoneMatch = "TRUE",1,0) Points
FROM list l
LEFT JOIN database d on IF(#Points <> 0,(l.first = d.first AND l.last = d.last),(l.first = d.first AND l.last = d.last AND l.address = d.vaddress));
The LEFT and RIGHT formulas within the IF statements are just attempting to control for unstandardized data that gets sent. I also would've done something with a WHERE statement, but I still need the NULL values to return so I know who matched and who didn't. So I ended up attempting to use an IF statement in the LEFT JOIN to say that if the Points cell was equal to zero, that the JOIN statement would get really specific and what I thought would hopefully still return the row but it wouldn't be matched to the database even if their first and last name did.
The query doesn't produce any errors, though unfortunately I'm still getting people back who have zeros in their Points column but matched with the database because their first and last names matched (which is what I was hoping the IF/Points stuff would stop).
Is this potentially a way to avoid bad matches, or am I going down the wrong path? If this isn't the right way to go, is there any other way to write one query that will return a full LEFT JOIN along with NULLs that don't match but have it be more specific than just first/last name but less work than doing a million queries based on a new table each time?
Thanks and hopefully that made some sense!
Your first query:
SELECT *,
#MidMatch := IF(LEFT(l.middle,1)=LEFT(d.middle,1),"TRUE","FALSE") MidMatch,
#AddressMatch := IF(left(l.address,5)=left(d.address,5),"TRUE","FALSE") AddressMatch,
#PhoneMatch := IF(right(l.phone,4)=right(d.phone,4),"TRUE","FALSE") PhoneMatch,
#Points := IF(#MidMatch = "TRUE",4,0) + IF(#AddressMatch = "TRUE",3,0) + IF(#PhoneMatch = "TRUE",1,0) Points
FROM list l LEFT JOIN
database d
on IF(#Points <> 0,(l.first = d.first AND l.last = d.last),(l.first = d.first AND l.last = d.last AND l.address = d.vaddress));
This is making a serious mistake with regards to variables. The simplest is the SELECT -- the SELECT does not guarantee the order of calculation of expressions, so they could calculated in any order. And the logic is wrong if #Points is calculated first. This problem is compounded by referring to variables in different clauses. The SQL statement is a logical statement describing the results set, not a programmatic statement of how the query is run.
Let me assume that you have a unique identifier for each row in the database (just to identify the row). Then you can get the match by using a correlated subquery:
select l.*,
(select d.databaseid
from database d
where l.first = d.first and l.last = d.last
order by (4 * (LEFT(l.middle, 1) = LEFT(d.middle, 1) ) +
3 * (left(l.address, 5) = left(d.address, 5)) +
1 * (right(l.phone, 4) = right(d.phone, 4))
)
limit 1
) as did
from list l;
You can join back to the database table to get more information if you need it.
EDIT:
Your comment made it clear. You don't just want the first and last name but something else as well.
select l.*,
(select d.databaseid
from database d
where l.first = d.first and l.last = d.last and
(LEFT(l.middle, 1) = LEFT(d.middle, 1) or
left(l.address, 5) = left(d.address, 5) or
right(l.phone, 4) = right(d.phone, 4)
)
order by (4 * (LEFT(l.middle, 1) = LEFT(d.middle, 1) ) +
3 * (left(l.address, 5) = left(d.address, 5)) +
1 * (right(l.phone, 4) = right(d.phone, 4))
)
limit 1
) as did
from list l;

MySQL: Subquery returns more than 1 row

I know this has been asked plenty times before, but I cant find an answer that is close to mine.
I have the following query:
SELECT c.cases_ID, c.cases_status, c.cases_title, ci.custinfo_FName, ci.custinfo_LName, c.cases_timestamp, o.organisation_name
FROM db_cases c, db_custinfo ci, db_organisation o
WHERE c.userInfo_ID = ci.userinfo_ID AND c.cases_status = '2'
AND organisation_name = (
SELECT organisation_name
FROM db_sites s, db_cases c
WHERE organisation_ID = '111'
)
AND s.sites_site_ID = c.sites_site_ID)
What I am trying to do is is get the cases, where the sites_site_ID which is defined in the cases, also appears in the db_sites sites table alongside its organisation_ID which I want to filter by as defined by "organisation_ID = '111'" but I am getting the response from MySQL as stated in the question.
I hope this makes sense, and I would appreciate any help on this one.
Thanks.
As the error states your subquery returns more then one row which it cannot do in this situation. If this is not expect results you really should investigate why this occurs. But if you know this will happen and want only the first result use LIMIT 1 to limit the results to one row.
SELECT organisation_name
FROM db_sites s, db_cases c
WHERE organisation_ID = '111'
LIMIT 1
Well the problem is, obviously, that your subquery returns more than one row which is invalid when using it as a scalar subquery such as with the = operator in the WHERE clause.
Instead you could do an inner join on the subquery which would filter your results to only rows that matched the ON clause. This will get you all rows that match, even if there is more than one returned in the subquery.
UPDATE:
You're likely getting more than one row from your subquery because you're doing a cross join on the db_sites and db_cases table. You're using the old-style join syntax and then not qualifying any predicate to join the tables on in the WHERE clause. Using this old style of joining tables is not recommended for this very reason. It would be better if you explicitly stated what kind of join it was and how the tables should be joined.
Good pages on joins:
http://dev.mysql.com/doc/refman/5.0/en/join.html (for the right syntax)
http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html (for the differences between the types of joins)
I was battling this for an hour, and overcomplicated it completely. Sometimes a quick break and writing it out on an online forum can solve it for you ;)
Here is the query as it should be.
SELECT c.cases_ID, c.cases_status, c.cases_title, ci.custinfo_FName, ci.custinfo_LName, c.cases_timestamp, c.sites_site_ID
FROM db_cases c, db_custinfo ci, db_sites s
WHERE c.userInfo_ID = ci.userinfo_ID AND c.cases_status = '2' AND (s.organisation_ID = '111' AND s.sites_site_ID = c.sites_site_ID)
Let me re-write what you have post:
SELECT
c.cases_ID, c.cases_status, c.cases_title, ci.custinfo_FName, ci.custinfo_LName,
c.cases_timestamp, c.sites_site_ID
FROM
db_cases c
JOIN
db_custinfo ci ON c.userInfo_ID = ci.userinfo_ID and c.cases_status = '2'
JOIN
db_sites s ON s.sites_site_ID = c.sites_site_ID and s.organization_ID = 111

Dev Code - Understanding What I Am Seeing

This is where I start by saying I am not a developer and this is not my code. As the DBA though it has shown up on plate from a performance perspective. The execution plan shows me that there are CI scans for Table2 aliased as D and Table2 aliased as E. Focusing on Table 2 aliased as E. The scan is coming from the subquery in the where clause for E.SEQ_NBR =
I am also seeing far more executions than need be. I know it depends on the exact index structure on the table, but at a high level is it likely that what I am seeing is a CI scan resulting from the aggregate (min) for every match it finds. Basically it is walking the table for the min SEQ_NBR for each match on EMPLID and other fields?
If likely, is it more a result of the manner in which it is written (I would think incorporating a CTE with some ROW_NUMBER logic would help) or lack of indexing? I am trying to avoid throwing an index at it "just because". I am getting hung up on that sub query in the where clause.
SELECT
D.EMPLID
,D.JOBCODE
,D.DEPTID
,E.DUR
,SUM(D.TL_QUANTITY) 'YTD_TL_QUANTITY'
FROM
Table1 B
,Table2 D
,Table2 E
WHERE
D.TRC = B.TRC
AND B.TL_ERNCD IN ( #0, #1, #2, #3, #4, #5, #6 )
AND D.EMPLID = E.EMPLID
AND D.EMPL_RCD = E.EMPL_RCD
AND D.DUR < = E.DUR
AND D.DUR > = '1/1/' + CAST(DATEPART(YEAR, E.DUR) AS CHAR)
AND E.SEQ_NBR =
( SELECT
MIN(EX.SEQ_NBR)
FROM
Table2 EX
WHERE
E.EMPLID = EX.EMPLID
AND E.EMPL_RCD = EX.EMPL_RCD
AND E.DUR = EX.DUR
)
AND B.EFFDT = ( SELECT
MAX(B_ED.EFFDT)
FROM
Table1 B_ED
WHERE
B.TRC = B_ED.TRC
AND B_ED.EFFDT < = GETDATE()
)
GROUP BY
D.EMPLID
,D.JOBCODE
,D.DEPTID
,E.DUR
The MIN operation has nothing to do with the CL scan. A MIN or Max is calculated using a sort. The problem is most likely the number of times the subquery is being executed. It has to loop through the subquery for every record returned in the parent query. A CTE may be helpful here depending on the size of Table2, but I don't think you need to worry about finding a replacement for the MIN() ... at least not yet.
Correlated subqueries are performance killers. Remove them and replace them with CTEs and JOINs or derived tables.
Try something like this (not tested)
SELECT
D.EMPLID
,D.JOBCODE
,D.DEPTID
,E.DUR
,SUM(D.TL_QUANTITY) 'YTD_TL_QUANTITY'
FROM Table1 B
JOIN Table2 D
ON D.TRC = B.TRC AND D.EMPLID = E.EMPLID
JOIN Table2 E
ON D.EMPL_RCD = E.EMPL_RCD AND D.DUR < = E.DUR
JOIN (SELECT MIN(EX.SEQ_NBR)FROM Table2) EX
ON E.EMPLID = EX.EMPLID
AND E.EMPL_RCD = EX.EMPL_RCD
AND E.DUR = EX.DUR
JOIN (SELECT MAX(B_ED.EFFDT)
FROM Table1
WHERE B_ED.EFFDT < = GETDATE()) B_ED
ON B.TRC = B_ED.TRC
WHERE B.TL_ERNCD IN ( #0, #1, #2, #3, #4, #5, #6 )
AND D.DUR > = '1/1/' + CAST(DATEPART(YEAR, E.DUR) AS CHAR)
As far as the implicit join syntax, do not allow anyone to ever do this again. It is a poor programming technique. As a DBA you can say what you will and will not allow in the database. Code review what is coming in and do not pass it until they remove the implicit syntax.
Why is is bad? In the first place you get accidental cross joins. Further, from a maintenance perspective, you can't tell if the cross join was accidental (and thus the query incorrect) or on purpose. This means the query with a cross join in it is unmaintainable.
Next, if you have to change some of the joins later to outer joins and do not fix all the implict ones at the same time, you can get incorrect results (which may not be noticed by an inexperienced developer. In SQL Server 2008 you cannot use the implicit syntax for an outer join, but it shouldn't have been used even as far back as SQl Server 2000 because Books Online (for SQL Server 2000) states that there are cases where it is misinterpreted. In other words, the syntax in unreliable for outer joins. There is no excuse ever for using an implicit join, you gain nothing from them over using an explicit join and they can create more problems.
You need to educate your developers and tell them that this code (which has been obsolete since 1992!) is not longer acceptable.
This a quick one, but this, CAST('1/1/' + CAST(DATEPART(YEAR, E.DUR) AS CHAR) AS DATETIME), it likely causing a table scan on Table2 E because the function likely has to be evaluated against each row.