I have a table say table1 with 50 records, table1's records are tied up with other child tables using constraints.
Not all the 50 records have constraints, there could be few records (say 15 ) without constraints, so i want to run a delete query deleting 15 entries alone out of total 50.
I tried delete ignore statement:
delete ignore from table1;
but it didn't help & I got this error:
Cannot delete or update a parent row:
a foreign key constraint fails
What is the best possible way to accomplish this in mysql queries?
DELETE FROM table1 WHERE NOT EXISTS (SELECT * FROM details_table d WHERE d.table1_id = table1.id)
Here's a simple, readable, efficient query that will do it for you:
DELETE FROM table1
WHERE id NOT IN (
SELECT table1_id FROM details_table_1
UNION
SELECT table1_id FROM details_table_2
-- more unions to other child tables as required
);
I've always preferred joins to sub-queries that use IN():
http://dev.mysql.com/doc/refman/5.5/en/rewriting-subqueries.html
Sometimes there are other ways to test
membership in a set of values than by
using a subquery. Also, on some
occasions, it is not only possible to
rewrite a query without a subquery,
but it can be more efficient to make
use of some of these techniques rather
than to use subqueries. One of these
is the IN() construct.
...
A LEFT [OUTER] JOIN can be faster than
an equivalent subquery because the
server might be able to optimize it
better—a fact that is not specific to
MySQL Server alone. Prior to SQL-92,
outer joins did not exist, so
subqueries were the only way to do
certain things. Today, MySQL Server
and many other modern database systems
offer a wide range of outer join
types.
Here's how to answer your question with LEFT OUTER JOIN:
DELETE FROM table1
LEFT OUTER JOIN child_table_1 c1 ON table1.id = c1.table_1_id
LEFT OUTER JOIN child_table_2 c2 ON table1.id = c2.table_1_id
-- More joins for additional child tables here
WHERE c1.table_1_id IS NULL
AND c2.table_1_id IS NULL
-- AND other child tables
;
Related
I am new to database index and I've just read about what an index is, differences between clustered and non clustered and what composite index is.
So for a inner join query like this:
SELECT columnA
FROM table1
INNER JOIN table2
ON table1.columnA= table2.columnA;
In order to speed up the join, should I create 2 indexes, one for table1.columnA and the other for table2.columnA , or just creating 1 index for table1 or table2?
One is good enough? I don't get it, for example, if I select some data from table2 first and based on the result to join on columnA, then I am looping through results one by one from table2, then an index from table2.columnA is totally useless here, because I don't need to find anything in table2 now. So I am needing a index for table1.columnA.
And vice versa, I need a table2.columnA if I select some results from table1 first and want to join on columnA.
Well, I don't know how in reality "select xxxx first then join based on ..." looks like, but that scenario just came into my mind. It would be much appreciated if someone could also give a simple example.
One index is sufficient, but the question is which one?
It depends on how the MySQL optimizer decides to order the tables in the join.
For an inner join, the results are the same for table1 INNER JOIN table2 versus table2 INNER JOIN table1, so the optimizer may choose to change the order. It is not constrained to join the tables in the order you specified them in your query.
The difference from an indexing perspective is whether it will first loop over rows of table1, and do lookups to find matching rows in table2, or vice-versa: loop over rows of table2 and do lookups to find rows in table1.
MySQL does joins as "nested loops". It's as if you had written code in your favorite language like this:
foreach row in table1 {
look up rows in table2 matching table1.column_name
}
This lookup will make use of the index in table2. An index in table1 is not relevant to this example, since your query is scanning every row of table1 anyway.
How can you tell which table order is used? You can use EXPLAIN. It will show you a row for each table reference in the query, and it will present them in the join order.
Keep in mind the presence of an index in either table may influence the optimizer's choice of how to order the tables. It will try to pick the table order that results in the least expensive query.
So maybe it doesn't matter which table you add the index to, because whichever one you put the index on will become the second table in the join order, because it makes it more efficient to do the lookup that way. Use EXPLAIN to find out.
90% of the time in a properly designed relational database, one of the two columns you join together is a primary key, and so should have a clustered index built for it.
So as long as you're in that case, you don't need to do anything at all. The only reason to add additional non-clustered indices is if you're also further filtering the join with a where clause at the end of your statement, you need to make sure both the join columns and the filtered columns are in a correct index together (ie correct sort order, etc).
I have two tables that each contain about 500 customer data records. Each record in each of the tables has an email field. Sometimes the same email addresses exist on both tables, sometimes not. I want to retrieve every email address on table1 that doesn't exist on table2. The email field in each table is indexed. I'm doing the select with a sub query that is really slow, 10 to 20 seconds.
select email
from
t1
where
email not in (select email from t2)
There's actually about 30K rows in each table, but I can knock it down to 500 each very quickly with an additional 'where' to filter by category. It's only when I add that subquery that it slows down dramatically. So, I am sure this can be faster, and I know a join should be much faster than the subquery, but can't figure out how to do that. I found a left outer join explanation here on SO, that looked like it should help, but got nowhere with it. Any help is appreciated.
mysql does not optimize a subquery in the WHERE clause (edit: it re-runs the subquery for every row tested)
to convert to a JOIN, try something like
SELECT email FROM t1
LEFT JOIN t2 ON (t1.email = t2.email)
WHERE t2.email IS NULL
this should run very fast, a covering index query.
The query optimizer should walk the email index of t1, check the
email index of t2, and output those emails that are in t1 but not in t2.
Edit: I should add, mysql does optimize a subquery in the JOIN clause: it runs the subquery and puts the results into a "derived table" (temporary table without any indexes), and joins the derived table like any other. The syntax is a bit funny, each derived table must have an alias, ie ... JOIN (SELECT ...) AS derived ON ....
Usually subqueries do more processing than usual query. In your case it first fetches all the emails from t2 and compares it with the email list of t1.
You can try like below, without using a sub query.
SELECT email FROM t1,t2 WHERE t1.email!=t2.email
The best way to improve the performance of SELECT operations is to create indexes on one or more of the columns that are tested in the query. The index entries act like pointers to the table rows, allowing the query to quickly determine which rows match a condition in the WHERE clause, and retrieve the other column values for those rows. All MySQL data types can be indexed.
some tricks for creating mysql tables ..
see this.
I think this should work fine
SELECT email from T1
LEFT JOIN T2
ON T1.email=T2.email
WHERE T2.email!=NULL
When doing a select query, how does the performance of a table self-join compare with a join between two different tables? For example, if tableA and tableB are exact duplicates (both structure and data), would it be preferable to:
select ... from tableA a inner join tableA b..., or
select ... from tableA a inner join tableB b...
Perhaps its not a straightforward answer, in which case, are there any references on this topic?
I am using MySql.
Thanks!
Assuming that table B is exact copy of table A, and that all necessary indexes are created, self-join of table A should be a bit faster than join of B with A simply because data from table A and its indexes can be reused from cache in order to perform self-join (this may also implicitly give more memory for self-join, and more rows will fit into working buffers).
If table B is not the same, then it is impossible to compare.
Related to my last question (MySQLi performance, multiple (separate) queries vs subqueries) I came across another question.
Sometimes I'm using a subquery to select the value from another table (eg. the username connected to an ID), but I'm not sure about the select-in-select, because it doesn't seem to be very clean and I'm not sure about the performance.
The subquery could look like this:
SELECT
(SELECT `user_name` FROM `users`
WHERE `user_id` = table2.user_id) AS `user_name`
, `value1`
, `value2`
FROM
`table2`
....
Would it be "better" to use a separate query for the result from table1 and another for table2 (doubles the connections, but no need to cross tables), or should I even use a JOIN to get the results in a single query?
I don't have much experience with JOINS and subqueries yet, so I'm not sure if a JOIN would be "too much" in this case, because I really just need one name connected to an ID (or maybe count the number of rows from a table), or if it doesn't matter, because the select-in-select is treated like some kind of JOIN, too..
Solution with JOIN could look like this:
SELECT
users.user_name , table2.value1, table2.value2
FROM
`table2`
INNER JOIN
`users`
ON
users.user_id = table2.user_id
....
And if I should prefer JOIN, which one would be best in this case: left join, inner join or something else?
The very fact that you are asking whether to use inner join or left join indeed shows that you haven't done much work with them.
The purposes of these two are entirely different, inner join is used to return columns from two or more tables where some columns have matching values. left join is used when you want the rows from the table specified left in the join clause to return even when there is no matching column in the other tables. It depends on your application. If one table has names of players, and another table contains details of penalties paid by them, then you will most certainly want to use left join, to account for players without a penalty, and thus without a record in the 2nd table.
Regarding whether to use subquery or join, joins can be much faster when properly used. By properly I mean, when there are indices on the join columns, the tables are specified in increasing order of the number of containing rows (generally. There might be exceptions), the join columns have similar data-types, etc. If all these conditions match, join would be the better option.
What's the difference in a clause done the two following ways?
SELECT * FROM table1 INNER JOIN table2 ON (
table2.col1 = table1.col2 AND
table2.member_id = 4
)
I've compared them both with basic queries and EXPLAIN EXTENDED and don't see a difference. I'm wondering if someone here has discovered a difference in a more complex/processing intensive envornment.
SELECT * FROM table1 INNER JOIN table2 ON (
table2.col1 = table1.col2
)
WHERE table2.member_id = 4
With an INNER join the two approaches give identical results and should produce the same query plan.
However there is a semantic difference between a JOIN (which describes a relationship between two tables) and a WHERE clause (which removes rows from the result set). This semantic difference should tell you which one to use. While it makes no difference to the result or to the performance, choosing the right syntax will help other readers of your code understand it more quickly.
Note that there can be a difference if you use an outer join instead of an inner join. For example, if you change INNER to LEFT and the join condition fails you would still get a row if you used the first method but it would be filtered away if you used the second method (because NULL is not equal to 4).
If you are trying to optimize and know your data, by adding the clause "STRAIGHT_JOIN" can tremendously improve performance. You have an inner join ON... So, just to confirm, you want only records where table1 and table2 are joined, but only for table 2 member ID = some value.. in this case 4.
I would change the query to have table 2 as the primary table of the select as it has an explicit "member_id" that could be optimized by an index to limit rows, then joining to table 1 like
select STRAIGHT_JOIN
t1.*
from
table2 t2,
table1 t1
where
t2.member_id = 4
and t2.col1 = t1.col2
So the query would pre-qualify only the member_id = 4 records, then match between table 1 and 2. So if table 2 had 50,000 records and table 1 had 400,000 records, having table2 listed first will be processed first. Limiting the ID = 4 even less, and even less when joined to table1.
I know for a fact the straight_join works as I've implemented it many times dealing with gov't data of 14+ million records linking to over 15 lookup tables where the engine got confused trying to think for me on the critical table. One such query was taking 24+ hours before hanging... Adding the "STRAIGHT_JOIN" and prioritizing what the "primary" table was in the query dropped it to a final correct result set in under 2 hours.
There's not really much of a difference in the situation you describe; in a situation with multiple complex joins, my understanding is that the first is somewhat preferential, as it will reduce the complexity somewhat; that said, it's going to be a small difference. Overall, you shouldn't notice much of a difference in most if not all situations.
With an inner join, it makes almost* no difference; if you switch to outer join, all the difference in the world.
*I say "almost" because optimizers are quirky beasts and it isn't impossible that under some circumstances, it might do a better job optimizing the former or the latter. Do not attempt to take advantage of this behavior.