SQL scalar subquery checking a row was found - sql-server-2008

Introduction
Sometimes instead of a join you can deliberately use a scalar subquery to check that not more than one row was found. For example you might have this query to look up nationality for some person rows.
select p.name,
c.iso
from person p
join person_country_map pcm
on p.id = pcm.person
join country c
on pcm.country = c.id
where p.id in (1, 2, 3)
Now, suppose that the person_country_map is not a functional mapping. A given person may map to more than one country - so the join may find more than one row. Or indeed, a person might not be in the mapping table at all, at least as far as any database constraints are concerned.
But for this particular query I happen to know that the persons I am querying will have exactly one country. That is the assumption I am basing my code on. But I would like to check that assumption where possible - so that if something went wrong and I end up trying to do this query for a person with more than one country, or with no country mapping row, it will die.
Adding a safety check for at most one row
To check for more than one row, you can rewrite the join as a scalar subquery:
select p.name,
(
select c.iso
from person_country_map pcm
join country c
on pc.country = c.id
where pcm.person = p.id
) as iso
from person p
where p.id in (1, 2, 3)
Now the DBMS will give an error if a person queried maps to two or more countries. It won't return multiple entries for the same person, as the straightforward join would. So I can sleep a bit easier knowing that this error case is being checked for even before any rows are returned to the application. As a careful programmer I might check in the application as well, of course.
Is it possible to have a safety check for no row found?
But what about if there is no row in person_country_map for a person? The scalar subquery will return null in that case, making it roughly equivalent to a left join.
(For the sake of argument assume a foreign key from person_country_map.country to country.id and a unique index on country.id so that particular join will always succeed and find exactly one country row.)
My question
Is there some way I can express in SQL that I want one and exactly one result? A plain scalar subquery is 'zero or one'. I would like to be able to say
select 42, (select exactly one x from t where id = 55)
and have the query fail at runtime if the subquery wouldn't return a row. Of course, the above syntax is fictional and I am sure it wouldn't be that easy.
I am using MSSQL 2008 R2, and in fact this code is in a stored procedure, so I can use TSQL if necessary. (Obviously ordinary declarative SQL is preferable since that can be used in view definitions too.) Of course, I can do an exists check, or I can select a value into a TSQL variable and then explicitly check it for nullness, and so on. I could even select results into a temporary table and then build unique indexes on that table as a check. But is there no more readable and elegant way to mark my assumption that a subquery returns exactly one row, and have that assumption checked by the DBMS?

You are making this harder than it needs to be
For sure you need a FK relationship on person.id to person_country_map.person
You either have unique constraint on person_country_map.person or you don't?
If you don't have a unique constraint then yes you can have multiple records for the same person_country_map.person.
If you want to know if you have any duplicate then
select pcm.person
from person_country_map pcm
group by pcm.person
having count(*) > 1
If there is more than one then you just need to determine which one
select p.name,
min(c.iso)
from person p
join person_country_map pcm
on p.id = pcm.person
join country c
on pcm.country = c.id
where p.id in (1, 2, 3)
group by p.name

In MSSQL it appears that isnull only evaluates its second argument if the first is null. So in general you can say
select isnull(x, 0/0)
to give a query which returns x if non-null and dies if that would give null. Applying this to a scalar subquery,
select 42, isnull((select x from t where id = 55), 0/0)
will guarantee that exactly one row is found by the select x subquery. If more than one, the DBMS itself will produce an error; if no row, the division by zero is triggered.
Applying this to the original example leads to the code
select p.name,
-- Get the unique country code of this person.
-- Although the database constraints do not guarantee it in general,
-- for this particular query we expect exactly one row. Check that.
--
isnull((
select c.iso
from person_country_map pcm
join country c
on pc.country = c.id
where pcm.person = p.id
), 0/0) as iso
from person p
where p.id in (1, 2, 3)
For a better error message you can use a conversion failure instead of division by zero:
select 42, isnull((select x from t where id = 55), convert(int, 'No row found'))
although that will need further convert shenanigans if the value you are fetching from the subquery is not itself an int.

Related

Using an INNER JOIN without returning any columns from the joined table

Running an INNER JOIN type of query, i get duplicate column names, which can pose a problem. This has been covered here extensively and i was able to find the solution to this problem, asides from it being fairly logical, by SELECTing only the columns i need.
However, i would like to know how i could run such a query without actually returning any of the columns from the joined table.
This is my MySQL query
SELECT * FROM product z
INNER JOIN crosslink__productXmanufacturer a
ON z.id = a.productId
WHERE
(z.title LIKE "%search_term%" OR z.search_keywords LIKE "%search_term%")
AND
z.availability = 1
AND
a.manufacturerId IN (22,23,24)
Question
How would i modify this MySQL query in order to return only columns from product and none of the columns from crosslink__productXmanufacturer?
Add the table name to the *. Replace
SELECT * FROM product z
with
SELECT z.* FROM product z
Often when you are doing this, the intention may be clearer using in or exists rather than a join. The join is being used for filtering, so putting the condition in the where clause makes sense:
SELECT p.*
FROM product p
WHERE (p.title LIKE '%search_term%' OR p.search_keywords LIKE '%search_term%') AND
p.availability = 1 AND
exists (SELECT 1
FROM pXm
WHERE pXm.productId = p.id AND pxm.manufacturerId IN (22, 23, 24)
);
With the proper indexes, this should run at least as fast as the join version (the index is crosslink__productXmanufacturer(productId, manufacturerId). In addition, you don't have to worry about returning duplicate records, if there are multiple matches in crosslink__productXmanufacturer.
You may notice two other small changes I made to the query. First, the table aliases are abbreviates for the table names, making the logic easier to follow. Second, the string constants use single quotes (the ANSI standard) rather than double quotes. Using single quotes only for string and date constants helps prevent inadvertent syntax errors.

simple joins between 2 mysql tables returning all results every time.. Help!

I just imported a large amount of data into two tables. Let's call them shipments and returns.
When trying to do a simple join (left or inner) based on any criteria in these two tables. query looks like it tries to do a cross join or find every combination instead of what the query should be pulling.
each table has an PK id field, but there is not FK relationship between the two other than some shared field.
I'm currently just trying to related them on shipment_id.
I feel this is a simple answer. Am I missing a reference or something obvious that is causing this? Thanks!
here's an example. This should returned under 100 rows. This instead returns hundreds of thousands.
SELECT r.*
FROM returns as r
left outer join shipments as s
on r.shipment_id = s.shipment_id
where r.date = '2011-06-20'
Here is a query that should work:
SELECT T0.*, T1.*
FROM shipments AS T0 LEFT JOIN returns AS T1 ON T0.shipment_id = T1.shipment_id
ORDER BY T0.shipment_id;
This query join assumes 1:1 on the shipment_id
It would be nice if you included the query you were using
You need to specify what you are joining on, otherwise it will do a cartesian join:
SELECT r.*
FROM returns as r
LEFT JOIN shipments as s ON s.shipment_id = r.shipment_id
where r.date = '2011-06-20'
Josh,
I would be interested in seeing what would happen if you forced a join to a specific record or set of records instead of the whole table. Assuming there is a shipment with an id of 5 in your table, you could try:
SELECT r.* FROM returns as r
left join shipments as s
ON 5 = r.shipment_id
WHERE r.date = '2011-06-20'
While just a fancy where clause, it would at least prove that the join you are attempting will eventually work correctly. The issue is that your on clause is always returning true, no matter what the value is. This could be because it's not interpreting the shipment_id as an integer, but instead as a true/false variable where any value evaluates to true.
Original Rejected Solution:
No Foreign Key relationship should be needed in order to make the joins happen. The PK id fields I'm assuming are an integer (or number, or whatever your rdms equivalent is)?
Can you past a snippet of your sql query?
Updating based on posted query:
I would add your explicit join criteria in order to rule out any funny business (my guess is since no criteria is specified, it's using 1=1, which always joins). So I would change your query to look like:
SELECT r.*
FROM returns as r
left join shipments as s ON
s.ShipId = R.ReturnId
where r.date = '2011-06-20'
The issue turned out to be very simple, just not readily apparent until going through all the columns. It turns out that the shipment ID was duplicated through every row as it hit the upper limit for the int datatype. This is why joins were returning every record.
After switching the datatype to bigint and reimporting, everything worked great. Thanks all for looking into it.

mysql if/else scenario

Ok, here's a fun one. I have 2 tables: tbl_notes, tbl_notes_categories
Simply, the tbl_notes has a categoryid, and I correlate the 2 tables with that ID. So, nothing too complicated.
I force users to choose a category, from a dropdown input, and stop them from submitting if they don't select something.
However, I want to change this, primarily for learning JOINs and how far I can go with them.
Sooooooo, I am not going to force a user to select a category, and instead, I will default the categoryid to zero, in the tbl_notes. (most users will select a category, but this is for other instances)
In the query, I am locked to showing only the notes that have a categoryid that exists in the tbl_notes_categories table. But, I would like to have a condition if the categoryid is not recognized OR is equal to zero, then specify another String. Like "--Unassigned--", or "--Category does not exist--"
Here's my original query:
SELECT n.notesubject, c.categoryname
FROM `tbl_notes` n, `tbl_notes_categories` c
WHERE n.categoryid = c.categoryid
This will not let me see the notes without a categoryid, so I pulled this one:
SELECT n.notesubject, c.categoryname
FROM `tbl_notes` n
LEFT JOIN `tbl_notes_categories` c ON n.categoryid = c.categoryid
And that helps, but I'm stuck at the 'condition' of displaying alternate text, in the case of a missing category record from the categories table.
In MySQL you can use IFNULL:
SELECT
n.notesubject,
IFNULL(c.categoryname, 'Unknown') AS categoryname
FROM tbl_notes AS n
LEFT JOIN tbl_notes_categories AS c
ON n.categoryid = c.categoryid
This will work if the category is not found, but it will also work if the category id is zero assuming that you don't have a matching row in your category table because then it will also not be found. If for some reason you do want a row in the categroy table with id zero then you can just set its name to 'Unknown'.
Note that IFNULL is MySQL specific. The function COALESCE will also work and is supported by more databases.
For IF/ELSE statements in general in MySQL can use IF or for a more general solution use a CASE expression: CASE WHEN condition THEN expr1 ELSE expr2 END.

What is the size limitation for IN and NOT IN in MySQL

I get out of memory exception in my application, when the condition for IN or NOT IN is very large. I would like to know what is the limitation for that.
Perhaps you would be better off with another way to accomplish your query?
I suggest you load your match values into a single-column table, and then inner-join the column being queried to the single column in the new table.
Rather than
SELECT a, b, c FROM t1 WHERE d in (d1, d2, d3, d4, ...)
build a temp table with 1 column, call it "dval"
dval
----
d1
d2
d3
SELECT a, b, c FROM t1
INNER JOIN temptbl ON t1.d = temptbl.dval
Having to ask about limits when either doing a SQL query or database design is a good indicator that you're doing it wrong.
I only ever use IN and NOT IN when the condition is very small (under 100 rows or so). It performs well in those scenarios. I use an OUTER JOIN when the condition is large as the query doesn't have to look up the "IN" condition for every tuple. You just have to check the table that you want all rows to come from.
For "IN" the join condition IS NOT NULL
For "NOT IN" the join condition IS NULL
e.g.
/* Get purchase orders that have never been rejected */
SELECT po.*
FROM PurchaseOrder po LEFT OUTER JOIN
(/* Get po's that have been rejected */
SELECT po.PurchaesOrderID
FROM PurchaseOrder po INNER JOIN
PurchaseOrderStatus pos ON po.PurchaseOrderID = pos.PurchaseOrderID
WHERE pos.Status = 'REJECTED'
) por ON po.PurchaseOrderID = por.PurchaseOrderID
WHERE por.PurchaseOrderID IS NULL /* We want NOT IN */
I"m having a similar issue but only passing 100 3 digit ids in my IN clause. When I look at the stack trace, it actually cuts off the comma separate values in the IN clause. I don't get an error, I just don't get all the results to return. Has anyone had an issue like this before? If its relevant, I'm using the symfony framework... I'm checking to see if its a propel issue but just wanted to see if it could be sql
I have used IN with quite large lists of IDs - I suspect that the memory problem is not in the query itself. How are you retrieving the results?
This query, for example is from a live site:
SELECT DISTINCT c.id, c.name FROM categories c
LEFT JOIN product_categories pc ON c.id = pc.category_id
LEFT JOIN products p ON p.id = pc.product_id
WHERE p.location_id IN (
955,891,901,877,736,918,900,836,846,914,771,773,833,
893,782,742,860,849,850,812,945,775,784,746,1036,863,
750,763,871,817,749,838,986,794,867,758,923,804,733,
949,808,837,741,747,954,939,865,857,787,820,783,760,
911,745,928,818,887,847,978,852
) ORDER BY c.name ASC
My first pass at the code is terribly naive and there are about 10 of these queries on a single page and the database doesn't blink.
You could, of course, be running a list of 100k values which would be a different story altogether.
I don't know what the limit is, but I've run into this problem before as well. I had to rewrite my query something like this:
select * from foo
where id in (select distinct foo_id from bar where ...)

In what order are MySQL JOINs evaluated?

I have the following query:
SELECT c.*
FROM companies AS c
JOIN users AS u USING(companyid)
JOIN jobs AS j USING(userid)
JOIN useraccounts AS us USING(userid)
WHERE j.jobid = 123;
I have the following questions:
Is the USING syntax synonymous with ON syntax?
Are these joins evaluated left to right? In other words, does this query say: x = companies JOIN users; y = x JOIN jobs; z = y JOIN useraccounts;
If the answer to question 2 is yes, is it safe to assume that the companies table has companyid, userid and jobid columns?
I don't understand how the WHERE clause can be used to pick rows on the companies table when it is referring to the alias "j"
Any help would be appreciated!
USING (fieldname) is a shorthand way of saying ON table1.fieldname = table2.fieldname.
SQL doesn't define the 'order' in which JOINS are done because it is not the nature of the language. Obviously an order has to be specified in the statement, but an INNER JOIN can be considered commutative: you can list them in any order and you will get the same results.
That said, when constructing a SELECT ... JOIN, particularly one that includes LEFT JOINs, I've found it makes sense to regard the third JOIN as joining the new table to the results of the first JOIN, the fourth JOIN as joining the results of the second JOIN, and so on.
More rarely, the specified order can influence the behaviour of the query optimizer, due to the way it influences the heuristics.
No. The way the query is assembled, it requires that companies and users both have a companyid, jobs has a userid and a jobid and useraccounts has a userid. However, only one of companies or user needs a userid for the JOIN to work.
The WHERE clause is filtering the whole result -- i.e. all JOINed columns -- using a column provided by the jobs table.
I can't answer the bit about the USING syntax. That's weird. I've never seen it before, having always used an ON clause instead.
But what I can tell you is that the order of JOIN operations is determined dynamically by the query optimizer when it constructs its query plan, based on a system of optimization heuristics, some of which are:
Is the JOIN performed on a primary key field? If so, this gets high priority in the query plan.
Is the JOIN performed on a foreign key field? This also gets high priority.
Does an index exist on the joined field? If so, bump the priority.
Is a JOIN operation performed on a field in WHERE clause? Can the WHERE clause expression be evaluated by examining the index (rather than by performing a table scan)? This is a major optimization opportunity, so it gets a major priority bump.
What is the cardinality of the joined column? Columns with high cardinality give the optimizer more opportunities to discriminate against false matches (those that don't satisfy the WHERE clause or the ON clause), so high-cardinality joins are usually processed before low-cardinality joins.
How many actual rows are in the joined table? Joining against a table with only 100 values is going to create less of a data explosion than joining against a table with ten million rows.
Anyhow... the point is... there are a LOT of variables that go into the query execution plan. If you want to see how MySQL optimizes its queries, use the EXPLAIN syntax.
And here's a good article to read:
http://www.informit.com/articles/article.aspx?p=377652
ON EDIT:
To answer your 4th question: You aren't querying the "companies" table. You're querying the joined cross-product of ALL four tables in your FROM and USING clauses.
The "j.jobid" alias is just the fully-qualified name of one of the columns in that joined collection of tables.
In MySQL, it's often interesting to ask the query optimizer what it plans to do, with:
EXPLAIN SELECT [...]
See "7.2.1 Optimizing Queries with EXPLAIN"
Here is a more detailed answer on JOIN precedence. In your case, the JOINs are all commutative. Let's try one where they aren't.
Build schema:
CREATE TABLE users (
name text
);
CREATE TABLE orders (
order_id text,
user_name text
);
CREATE TABLE shipments (
order_id text,
fulfiller text
);
Add data:
INSERT INTO users VALUES ('Bob'), ('Mary');
INSERT INTO orders VALUES ('order1', 'Bob');
INSERT INTO shipments VALUES ('order1', 'Fulfilling Mary');
Run query:
SELECT *
FROM users
LEFT OUTER JOIN orders
ON orders.user_name = users.name
JOIN shipments
ON shipments.order_id = orders.order_id
Result:
Only the Bob row is returned
Analysis:
In this query the LEFT OUTER JOIN was evaluated first and the JOIN was evaluated on the composite result of the LEFT OUTER JOIN.
Second query:
SELECT *
FROM users
LEFT OUTER JOIN (
orders
JOIN shipments
ON shipments.order_id = orders.order_id)
ON orders.user_name = users.name
Result:
One row for Bob (with the fulfillment data) and one row for Mary with NULLs for fulfillment data.
Analysis:
The parenthesis changed the evaluation order.
Further MySQL documentation is at https://dev.mysql.com/doc/refman/5.5/en/nested-join-optimization.html
SEE http://dev.mysql.com/doc/refman/5.0/en/join.html
AND start reading here:
Join Processing Changes in MySQL 5.0.12
Beginning with MySQL 5.0.12, natural joins and joins with USING, including outer join variants, are processed according to the SQL:2003 standard. The goal was to align the syntax and semantics of MySQL with respect to NATURAL JOIN and JOIN ... USING according to SQL:2003. However, these changes in join processing can result in different output columns for some joins. Also, some queries that appeared to work correctly in older versions must be rewritten to comply with the standard.
These changes have five main aspects:
The way that MySQL determines the result columns of NATURAL or USING join operations (and thus the result of the entire FROM clause).
Expansion of SELECT * and SELECT tbl_name.* into a list of selected columns.
Resolution of column names in NATURAL or USING joins.
Transformation of NATURAL or USING joins into JOIN ... ON.
Resolution of column names in the ON condition of a JOIN ... ON.
Im not sure about the ON vs USING part (though this website says they are the same)
As for the ordering question, its entirely implementation (and probably query) specific. MYSQL most likely picks an order when compiling the request. If you do want to enforce a particular order you would have to 'nest' your queries:
SELECT c.*
FROM companies AS c
JOIN (SELECT * FROM users AS u
JOIN (SELECT * FROM jobs AS j USING(userid)
JOIN useraccounts AS us USING(userid)
WHERE j.jobid = 123)
)
as for part 4: the where clause limits what rows from the jobs table are eligible to be JOINed on. So if there are rows which would join due to the matching userids but don't have the correct jobid then they will be omitted.
1) Using is not exactly the same as on, but it is short hand where both tables have a column with the same name you are joining on... see: http://www.java2s.com/Tutorial/MySQL/0100__Table-Join/ThekeywordUSINGcanbeusedasareplacementfortheONkeywordduringthetableJoins.htm
It is more difficult to read in my opinion, so I'd go spelling out the joins.
3) It is not clear from this query, but I would guess it does not.
2) Assuming you are joining through the other tables (not all directly on companyies) the order in this query does matter... see comparisons below:
Origional:
SELECT c.*
FROM companies AS c
JOIN users AS u USING(companyid)
JOIN jobs AS j USING(userid)
JOIN useraccounts AS us USING(userid)
WHERE j.jobid = 123
What I think it is likely suggesting:
SELECT c.*
FROM companies AS c
JOIN users AS u on u.companyid = c.companyid
JOIN jobs AS j on j.userid = u.userid
JOIN useraccounts AS us on us.userid = u.userid
WHERE j.jobid = 123
You could switch you lines joining jobs & usersaccounts here.
What it would look like if everything joined on company:
SELECT c.*
FROM companies AS c
JOIN users AS u on u.companyid = c.companyid
JOIN jobs AS j on j.userid = c.userid
JOIN useraccounts AS us on us.userid = c.userid
WHERE j.jobid = 123
This doesn't really make logical sense... unless each user has their own company.
4.) The magic of sql is that you can only show certain columns but all of them are their for sorting and filtering...
if you returned
SELECT c.*, j.jobid....
you could clearly see what it was filtering on, but the database server doesn't care if you output a row or not for filtering.