I'm trying to find the (set) intersection between two columns in the same table in MySQL. I basically want to find the rows that have either a col1 element that is in the table's col2, or a col2 element that is in the table's col1.
Initially I tried:
SELECT * FROM table WHERE col1 IN (SELECT col2 FROM table)
which was syntactically valid, however the run-time is far too high. The number of rows in the table is ~300,000 and the two columns in question are not indexed. I assume the run time is either n^2 or n^3 depending on whether MySQL executes the subquery again for each element of the table or if it stores the result of the subquery temporarily.
Next I thought of taking the union of the two columns and removing distinct elements, because if an element shows up more than once in this union then it must have been present in both columns (assuming both columns contain only distinct elements).
Is there a more elegant (i.e. faster) way to find the set intersection between two columns of the same table?
SELECT t1.*
FROM table t1
INNER JOIN table t2
ON t1.col1 = t2.col2
Creating indexes on col1 and col2 would go a long way to help this query as well.
If you only want the values, try the INTERSECT command:
(SELECT col1 FROM table) INTERSECT (SELECT col2 FROM table)
Related
I have two tables (table1, table2) in a database (both with type InnoDB). They both have a column "article". In table1 "article" is the primary index, in table2 "article" is defined as "unique". Both of those columns have data type varchar(32), also the same collation.
I am trying to get a list of all "article" values which are in table1, but NOT in table2.
table1 contains about 5000 rows, table2 contains about 3000 rows, so I should get at least 2000 "article" values as a result. My query looks like this:
SELECT article FROM table1
WHERE article NOT IN
(SELECT article FROM table2);
But this returns an empty result...
When I do it the other way around (i.e. select all "article"s from table2 which are not in table1), it works, that query returns around 700 values.
I suppose this must have to do with the different index/unique status of "article" in the two tables. But how can I modify the query to get it working?
Use a left join instead. It is faster with many values anyway:
SELECT t1.article
FROM table1 t1
LEFT JOIN table2 t2 ON t1.article = t2.article
WHERE t2.article IS NULL
I just found a second solution myself (despite the accepted answer fully working): Apparently in this situation the subquery requires a WHERE clause for the whole query to work. So I added a WHERE clause that will apply to all rows in table2 (i.e. WHERE article != ""). So the complete (working) query now looks like this:
SELECT article FROM table1
WHERE article NOT IN
(SELECT article FROM table2 WHERE article != "");
I have a table with two columns of IDs like this :
How do i covert it into two columns with their corresponding names using MySQL?
Without table definitions, we're just guessing:
We'll assume that the two column table is named first_table and contains columns named col1 and col2. (I resisted my Dr.Seuss-like temptation to name them thing1 and thing2.)
We'll assume that there's a second table, unfortunately named second_table, that contains columns rid and name.
We can use a query like this:
SELECT t.rid
, s.name
FROM ( SELECT t1.col1 AS rid
FROM first_table t1
UNION ALL
SELECT t2.col2
FROM first_table t2
) t
LEFT
JOIN second_table s
ON s.rid = t.rid
ORDER
BY t.rid
The inline view t gets the contents of col1 into a set, and then concatenates a second set to it, the contents of col2. This gives us a single list of values.
We wrap that query into a set of parens, to turn it into an inline view (or derived table if we use the MySQL vernacular.)
We can then do a join operation to do the lookup of name, matching on the rid column.
This isn't the only way to do it, there are other query patterns that will return an equivalent result.
I have two tables that almost have identical columns. The first table contains the "current" state of a particular record and the second table contains all the previous stats of that records (it's a history table). The second table has a FK to the first table.
I'd like to query both tables so I get the entire records history, including its current state in one result. I don't think a JOIN is what I'm trying to do as that "joins" multiple tables "horizontally" (one or more columns of one table combined with one or more columns of another table to produce a result that includes columns from both tables). Rather, I'm trying to "join"(???) the tables "vertically" (meaning, no columns are getting added to the result, just that the results from both tables are falling under the same columns in the result set).
Not exactly sure if what I'm expressing make sense -- or if it's possible in MySQL.
To accomplish this, you could use a UNION between two SELECT statements. I would also suggest selecting from a derived table in the following manner so that you can sort by columns in your result set. Suppose we wanted to combine results from the following two queries:
SELECT FieldA, FieldB FROM table1;
SELECT FieldX, FieldY FROM table2;
We could join these with a UNION statement as follows:
SELECT Field1, Field2 FROM (
SELECT FieldA AS `Field1`, FieldB AS `Field2` FROM table1
UNION SELECT FieldX AS `Field1`, FieldY AS `Field2` FROM table2)
AS `derived_table`
ORDER BY Field1 ASC, Field2 DESC
In this example, I have selected from table1 and table2 fields which are similar, but not identically named, sharing the same data type. They are matched up using aliases (e.g., FieldA in table1 and FieldX in table2 both map to Field1 in the result set, etc.).
If each table has the same column names, field aliasing is not required, and the query becomes simpler.
Note: In MySQL it is necessary to name derived tables, even if the name given is not intended to be used.
UNION.
Select colA, colB From TblA
UNION
Select colA, colB From TblB
Your after a left join on the first table. That will make the right side I'd he their a number (exists in both) or null (exists only in the left table )
You want
select lhs.* , rhs.id from lhs left join rhs using(Id)
I have two tables containing 6M rows each. I'm trying to join the two using an inner join but the query ran for 2 days without finishing. The join is (note I've used count(*) just to enable me to run an explain, I'm actually using the join in a CTAS):
SELECT count(*)
FROM table1 t1,
table2 t2
WHERE t1.col1 = t2.colA
AND t1.col2 = t2.colB;
After a bit of investigation I've found the below query runs fine:
SELECT count(*)
FROM
(SELECT *
FROM table1) t1,
(SELECT *
FROM table2) t2
WHERE t1.col1 = t2.colA
AND t1.col2 = t2.colB;
The only difference between that instead of the table, I use the sub-query SELECT * FROM table;
Running the explain plans shows that the latter query is building up an index when it selects table2. Whereas the first query is using a join buffer (Block Nested Loop).
Surely MySQL is clever enough to work out that the two queries are practically identical and do the same with both queries? I don't see why an index should be need because a full scan is required for both tables anyway. These are temporary/transitory tables so if I did put an index on, it would literally be just to perform this join.
Is there a way to fix this via MySQL configuration?
You NEED the index on at least ONE of the tables, even such as
create index Temp1 on Table2 ( colA, colB )
So, your query from Table 1 joined to table 2, so even if a table scan is on all of table 1, you need it to quickly find the record(s) that match in table 2. If NEITHER has an index, then think of it this way. For every record in Table1, scan through ALL records in Table 2 and grab all records that match for ColA, ColB. Now, go back to table 1 for the SECOND record... go back through table 2 for ALL records until it finds a match.
Being that you have 6M records, you could practically choke a cow (so-to-speak) on performance. By having an index, even on the SECOND table, when the query is on the first record, it can immediately jump to the rows that match ColA, ColB and as soon as those A/B records are done, it goes back to the first table.
Now, for other overhead efficiencies. If you have BOTH tables indexed on respective Col1, Col2 and ColA, ColB, then the engine will have in its memory / cache a whole block of records for each common area and doesn't have to keep going back to the raw data pages for other elements repeatedly.
So, even though you think it might not be practical, it is still good to handle large table queries. Also, if you have multiple records in the first table with the same values for Col1, Col2, but have different other values for other columns in the table, and similarly in the second table for multiple ColA, ColB, you would get a Cartesian result. Consider the following scenario
Table1
Col1 Col2 OtherColumn
X Y blah1
X Y blah2
X Y blah3
Table2
ColA ColB OtherColumn
X Y second blah1
X Y second blah2
X Y second blah3
A simple query like you have
SELECT count(*)
FROM table1 t1,
table2 t2
WHERE t1.col1 = t2.colA
AND t1.col2 = t2.colB;
would result in a count of 9. You have 6M records and a possible Cartesian result? Hopefully this clarifies some problems you may be encountering.
table1:
columns: id, name
table2:
columns: id, name
assoc_table1_table2:
columns: id_table1, id_table2
I need to select all rows from table1 where at least one row in table2 is associated with this row.
What would be an efficient way to do it? Or, more correct in some way?
I'm thinking of:
SELECT DISTINCT t.id, t.name
FROM table1 t
JOIN assoc_table1_table2 a ON t.id=a.id_table1;
or:
SELECT id, name
FROM table1 t WHERE EXISTS (
SELECT *
FROM assoc_table1_table2 a
WHERE t.id=a.id_table1
);
Any ideas on what of the above is generally faster?
(the obvious indices are in place)
Neither.
I'd recommend using a "WHERE EXISTS" as it will give the optimizer more freedom.
Using "WHERE COUNT(*)" or DISTINCT will force a full table scan to compute.
You only want to know whether at least 1 row exists, for example, on a billion row table. "WHERE EXISTS" can be satisfied as soon as the db finds the first row. On databases with reasonable optimizers, you should find it works well.