match multiple columns to multiple-row subquery? - mysql

I'm very much still learning about mySQL (am still really only comfortable with basic queries, count, order by etc.). It is very likely that this question has been asked before, however either I don't know what to search for, or I'm too much of a novice to understand the answers:
I have two tables:
tb1 (a,b,path)
tb2 (a,b,value)
I would like to make a query that returns "path" for each row in tb1 whose a,b matches a different query on tb2. In bad mysql, it would be something like:
select
path
from tb1
where
a=(select a from tb2 where value < v1)
and
b=(select b from tb2 where value < v1);
however, this doesn't work, as the subqueries are returning multiple values. Note that exchanging = by in is not good enough, as that would be true for combinations of a,b-values that are not returned by select a,b from tb2 where value < v1
Basically, I have identified an interesting area in (a,b)-space based on tb2, and would like to study the behavior of tb1 within that area (if that makes it any clearer).
thank you :)

This is a job for an INNER JOIN on both a and b:
SELECT
path
FROM
tb1
INNER JOIN tb2 ON tb1.a = tb2.a AND tb1.b = tb2.b
/* add your condition to the WHERE clause */
WHERE tb2.value < v1
The use cases for subqueries in the SELECT list or WHERE clause can very often be handled instead using some type of JOIN. The join will frequently be faster than the subquery, owing to the fact that when using a SELECT or WHERE subquery, the subquery may need to be performed for each row returned, rather than only once.
Beyond the MySQL documentation on JOINs linked above, I would also recommend Jeff Atwood's Visual Explanation of SQL JOINs

INNER JOIN will do the trick.
You just need two ON criteria in order to match both the a and b values, like so:
SELECT path
FROM tb1
INNER JOIN tb2 ON tb1.a = tb2.a AND tb1.b = tb2.b
WHERE tb2.value < v1

You can limit your result set this way:
select
path
from tb1
where
a=(select a from tb2 where value < v1 LIMIT 1)
and
b=(select b from tb2 where value < v1 LIMIT 1);

Related

Can I maintain the output of INNER JOIN to be sorted based on the order of the left side table (the 1st table)?

In PostgreSQL - in the following query I perform an INNER JOIN on two tables -
the 1st table (patient_bvi_p) is SORTED. I extract the gene name (a simple string) from the "id4" column and then using this value for performing the INNER JOIN with the 2nd table (geneexpressoin17p).
My issue is that after performing the INNER JOIN the result of my query is all scrambled.
The rows are no longer being sorted based on the left hand table (patient_bvi_p) while I really need/want them to be.
Can someone please explain what is the behavior one should expect after performing an INNER JOIN? Shouldn't the output be sorted in the same way the the left (/first) table was sorted?
Is there a way to maintain somehow the original order? OR - I should always assume that after INNER JOIN the resultant output is unsorted (=scrambled) - and therefore I should perform an extra sorting step AFTER the doing the the INNER join?...
My motivation is basically to avoid an extra sorting step and to rely on the original order of my first table.
select
t1.* ,
bvi_d_exp,
bvi_r_exp,
bvi_exp.bvi_lr_rvd
into Patient_bvi_p_exp
from
(
select split_part(id4, '#', 3) genes, *
from patient_bvi_p
) t1
inner join (
select
genename,
bvi_d_exp,
bvi_r_exp,
bvi_lr_rvd
from geneexpression17p
) bvi_exp on lower(t1.genes) = lower(bvi_exp.genename)
The order of rows in a query output is undefined if there is no order by clause. Postgres will output in any way it sees fit. If you want the output to be ordered you must specify an order by. In other words, you should not rely on output order like you describe, it could change if it is not specified. That said, in your example:
select t1.* ,bvi_d_exp,bvi_r_exp,bvi_exp.bvi_lr_rvd
into Patient_bvi_p_exp
from (select split_part(id4, '#', 3)genes,* from patient_bvi_p)
t1 inner join (select genename,bvi_d_exp,bvi_r_exp,bvi_lr_rvd
from geneexpression17p) bvi_exp on lower(t1.genes)= lower(bvi_exp.genename);
I think you are saying that if you do this:
select * from Patient_bvi_p_exp;
You get random ordering. Yes, that is true. Again, don't rely on order. However, you could:
select t1.* ,bvi_d_exp,bvi_r_exp,bvi_exp.bvi_lr_rvd
into Patient_bvi_p_exp
from (select split_part(id4, '#', 3)genes,* from patient_bvi_p)
t1 inner join (select genename,bvi_d_exp,bvi_r_exp,bvi_lr_rvd
from geneexpression17p) bvi_exp on lower(t1.genes)= lower(bvi_exp.genename)
order by bvi_d;
And that will cause your table to be ordered by the bvi_d column (or whichever you want). So, a simple select on that table will probably return it in the correct order. Or, if you already ran your first query, you could:
create index whatever on Patient_bvi_p_exp(bvi_d);
cluster Patient_bvi_p_exp using whatever;
And this would physically reorder the table such that a simple select would return it in the order you desire.
I have to say again, you are safer doing:
select * from Patient_bvi_p_exp order by bvi_d;
the 1st table (patient_bvi_p) is SORTED
There is no "sorted" table in SQL. If you want a sorted result then use the order by clause

Nested SELECT SQL Queries Workbench

Hi i have this query but its giving me an error of Operand should contain 1 column(s) not sure why?
Select *,
(Select *
FROM InstrumentModel
WHERE InstrumentModel.InstrumentModelID=Instrument.InstrumentModelID)
FROM Instrument
according to your query you wanted to get data from instrument and instrumentModel table and in your case its expecting "from table name " after your select * .when the subselect query runs to get its result its not finding table instrument.InstrumentModelId inorder to fetch result from both the table by matching you can use join .or you can also select perticuler fields by tableName.fieldName and in where condition use your condition.
like :
select Instrument.x,InstrumentModel.y
from instrument,instrumentModel
where instrument.x=instrumentModel.y
You can use a join to select from 2 connected tables
select *
from Instrument i
join InstrumentModel m on m.InstrumentModelID = i.InstrumentModelID
When you use subqueries in the column list, they need to return exactly one value. You can read more in the documentation
as a user commented in the documentation, using subqueries like this can ruin your performance:
when the same subquery is used several times, mysql does not use this fact to optimize the query, so be careful not to run into performance problems.
example:
SELECT
col0,
(SELECT col1 FROM table1 WHERE table1.id = table0.id),
(SELECT col2 FROM table1 WHERE table1.id = table0.id)
FROM
table0
WHERE ...
the join of table0 with table1 is executed once for EACH subquery, leading to very bad performance for this kind of query.
Therefore you should rather join the tables, as described by the other answer.

How to optimize a MySQL update which contains an "in" subquery?

How do I optimize the following update because the sub-query is being executed for each row in table a?
update
a
set
col = 1
where
col_foreign_id not in (select col_foreign_id in b)
You could potentially use an outer join where there are no matching records instead of your not in:
update table1 a
left join table2 b on a.col_foreign_id = b.col_foreign_id
set a.col = 1
where b.col_foreign_id is null
This should use a simple select type rather than a dependent subquery.
Your current query (or the one that actually works since the example in the OP doesn't look like it would) is potentially dangerous in that a NULL in b.col_foreign_id would cause nothing to match, and you'd update no rows.
not exists would also be something to look at if you want to replace not in.
I can't tell you that this will make your query any faster, but there is some good info here. You'll have to test in your environment.
Here's a SQL Fiddle illuminating the differences between in, exists, and outer join (check the rows returned, null handling, and execution plans).

How to find non-existing data from another Table by JOIN?

I have two tables TABLE1 which looks like:
id name address
1 mm 123
2 nn 143
and TABLE2 w/c looks like:
name age
mm 6
oo 9
I want to get the non existing names by comparing the TABLE1 with the TABLE2.
So basically, I have to get the 2nd row, w/c has a NN name that doesn't exist in the TABLE2, the output should look like this:
id name address
2 nn 143
I've tried this but it doesn't work:
SELECt w.* FROM TABLE1 W INNER JOIN TABLE2 V
ON W.NAME <> V.NAME
and it's still getting the existing records.
An INNER JOIN doesn't help here.
One way to solve this is by using a LEFT JOIN:
SELECT w.*
FROM TABLE1 W
LEFT JOIN TABLE2 V ON W.name = V.name
WHERE ISNULL(V.name);
The relational operator you require is semi difference a.k.a. antijoin.
Most SQL products lacks an explicit semi difference operator or keyword. Standard SQL-92 doesn't have one (it has a MATCH (subquery) semijoin predicate but, although tempting to think otherwise, the semantics for NOT MATCH (subquery) are not the same as for semi difference; FWIW the truly relational language Tutorial D successfully uses the NOT MATCHING semi difference).
Semi difference can of course be written using other SQL predicates. The most commonly seen are: outer join with a test for nulls in the WHERE clause, closely followed by EXISTS or IN (subquery). Using EXCEPT (equivalent to MINUS in Oracle) is another possible approach if your SQL product supports it and again depending on the data (specifically, when the headings of the two tables are the same).
Personally, I prefer to use EXISTS in SQL for semi difference join because the join clauses are closer together in the written code and doesn't result in projection over the joined table e.g.
SELECT *
FROM TABLE1 W
WHERE NOT EXISTS (
SELECT *
FROM TABLE2 V
WHERE W.NAME = V.NAME
);
As with NOT IN (subquery) (same for the outer join approach), you need to take extra care if the WHERE clause within the subquery involves nulls (hint: if WHERE clause in the subquery evaluates UNKNOWN due to the presence of nulls then it will be coerced to be FALSE by EXISTS, which may yield unexpected results).
UPDATE (3 years on): I've since flipped to preferring NOT IN (subquery) because it is more readable and if you are worried about unexpected results with nulls (and you should be) then stop using them entirely, I did many more years ago.
One way in which it is more readable is there is no requirement for the range variables W and V e.g.
SELECT * FROM TABLE1 WHERE name NOT IN ( SELECT name FROM TABLE2 );

Use ORDER BY 'x' with a JOIN, but keep rows that don't have a value for 'x'

This is simplified version of a relatively complex problem that myself and my colleagues can't quite get our heads around.
Consider two tables, table_a and table_b. In our CMS table_a holds metadata for all the data stored in the database, and table_b has some more specific information, so for simplicity's sake, a title and date column.
At the moment our query looks like:
SELECT *
FROM `table_a` LEFT OUTER JOIN `table_b` ON (table_a.id = table_b.id)
WHERE table_a.col = 'value'
ORDER BY table_b.date ASC
LIMIT 0,20
This degrades badly when table_a has a large amount of rows. If the JOIN is changed RIGHT OUTER JOIN (which triggers MySQL to use the INDEX set on table_b.date), the query is infinitely quicker, but it doesn't produce the same results (because if table_b.date doesn't have a value, it is ignored).
This becomes an issue in our CMS because if the user sorts on the date column, any rows that don't have a date set yet disappear from the interface, creating a confusing UI experience and makes it difficult to add dates for the rows that missing them.
Is there a solution that will:
Use table_b.date's INDEX so that
the query will scale better
Somehow retain those rows in
table_b that don't have a date
set so that a user can enter the
data
I'm going to second ArtoAle's comment. since the order by applies to a null value in the outer join for missing rows in table_b, those rows will be out of order anyway.
The simulated outer join is the ugly part, so lets look at that first. Mysql doesn't have except, so you need to write the query in terms of exists.
SELECT table_a.col1, table_a.col2, table_a.col3, ... NULL as table_b_col1, NULL as ...
FROM
table_a
WHERE
NOT EXISTS (SELECT 1 FROM table_a INNER JOIN table_b ON table_a.id = table_b.id);
Which should be UNION ALLed with the original query as an inner join. The UNION_ALL is needed to preserve the original order.
This sort of query is probably going to be dog-slow no matter what you do, because there won't be an index that readily supports a "Foreign Key not present" sort of query. This basically boils down to an index scan in table_a.id with a lookup (Or maybe a parallel scan) for the corresponding row in table_b.id.
So we ended up implemented a different solution that while the results were not as good as using an INDEX, it still provided a nice speed boost of around 25%.
We remove the JOIN and instead used an ORDER BY subquery:
SELECT *
FROM `table_a`
WHERE table_a.col = 'value'
ORDER BY (
SELECT date
FROM table_b
WHERE id = table_a.id
) ASC
LIMIT 0,20