I need to process a queue and MySQL for update is the goal for that.
Everything is working fine, except the fact for update is locking joined tables too.
I have 4 functions that are called simultaneously and 2 of them joins the same table.
The problem is that it seems it's locking all joined table and not specific joined rows.
My code is something like that (just example):
Query #1:
SELECT t1.id, t1.name, t2.balance FROM t1
LEFT JOIN t2 ON t2.user_id = t1.id
WHERE t2.balance < 0
LIMIT 10
FOR UPDATE
SKIP LOCKED
Query #2:
SELECT t3.id, t3.action, t2.balance FROM t3
LEFT JOIN t2 ON t2.user_id = t3.user_id
WHERE t2.balance < 0
LIMIT 10
FOR UPDATE
SKIP LOCKED
I didn't find anything related to this.
Is there a way to avoid that behavior?
Related
I work on a query in mysql that spend 30 sec to execute. The format is like this :
SELECT id
FROM table1 t1
INNER JOIN table2 t2
ON t1.id = t2.idt2
The INNER JOIN take 25 of 30 sec. When I write this like this :
SELECT id
FROM table1 t1
INNER JOIN (
SELECT idt2,col1,col2,col3
FROM table2
) t2
ON t1.id = t2.idt2
It take only 8 sec! Why does it work? I'm afraid of losing data.
(obviously, my query is more complex than this one, it's just an exemple)
Well you haven't shown us the EXPLAIN output
EXPLAIN SELECT id
FROM table1 t1
INNER JOIN table2 t2
ON t1.id = t2.idt2
this would definitly give us some insights of your query and table sctructures.
Based on your scenario, 1st query seems like you have issues with indexing.
What happened in your 2nd query is the optimizer is creating a temporary set from your subquery furthering filtering your data. I dont recommend doing that in MOST cases.
Purpose of subquery is to solve complex logic, not an instant solution for everything.
I have a table containing roughly 5 million rows and 150 columns. However, there are several similar rows that I would like to consider duplicates if they share the same values for 3 columns : ID, Order and Name.
However, I don't just want to delete the duplicates at random, I want the row I consider a duplicate to be the one which has the smaller count value (Count being another column) or if they have the same count, then base it on which has the earliest date column (Date is another column).
I have tried with the code below:
DELETE t1 FROM uploaddata_copy t1
JOIN uploaddata_copy t2
ON t2.Name = t1.Name
AND t2.ID = t1.ID
AND t2.Order = t1.Order
AND t2.Count < t1.Count
AND t2.Date < t1.Date
However (and this is probably due to my computer) it seems to run indefinitely (~25mins) before timing out from the server so I'm left unsure if this is correct and I just need to run for even longer or if the code is inherently wrong and there is a quicker way of doing it.
A more accurate query would be:
DELETE t1
FROM uploaddata_copy t1 JOIN
uploaddata_copy t2
ON t2.Name = t1.Name AND
t2.ID = t1.ID AND
t2.Order = t1.Order AND
(t2.Count < t1.Count OR
t2.Count = t1.Count AND t2.Date < t1.Date
);
However, fixing the logic will not (in this case) improve performance. First, you want an index on uploaddata_copy(Name, Id, Order, Count, Date). This allows the "lookup" to be between the original data and only the index.
Second, start small. Add a LIMIT 1 or LIMIT 10 to see how long it takes to remove just a few rows. Deleting rows is a complicated process, because it affects the table, indexes, and the transaction log -- not to mention any triggers on the table.
If a lot of rows are being deleted, you might find it faster to re-create the table, but that depends heavily on the relative number of rows being removed.
Why the join? You want to delete rows when there exists a "better" record. So use an EXISTS clause:
delete from dup using uploaddata_copy as dup
where exists
(
select *
from uploaddata_copy better
where better.name = dup.name
and better.id = dup.id
and better.order = dup.order
and (better.count > dup.count or (better.count = dup.count and better.date > dup.date))
);
(Please check my comparisions. This is how I understand this: A better record for name + id + order has a greater count or the same count and a higher date. You consider the worse record an undesired duplicate you want to delete.)
You'd have an index on uploaddata_copy(id, name, order) at least or better even on uploaddata_copy(id, name, order, count, date) for this delete statement to perform well.
Please try with this:
DELETE t1 FROM uploaddata_copy t1
JOIN uploaddata_copy t2
ON t2.Name = t1.Name
AND t2.ID = t1.ID
AND t2.Order = t1.Order
AND t2.Count < t1.Count
AND t2.Date < t1.Date
AND t2.primary_key != t1.primary_key
I try to explain a very high level
I have two complex SELECT queries(for the sake of example I reduce the queries to the following):
SELECT id, t3_id FROM t1;
SELECT t3_id, MAX(added) as last FROM t2 GROUP BY t3_id;
query 1 returns 16k rows and query 2 returns 15k
each queries individually takes less than 1 second to compute
However what I need is to sort the results using column added of query 2, when I try to use LEFT join
SELECT
t1.id, t1.t3_Id
FROM
t1
LEFT JOIN
(SELECT t3_id, MAX(added) as last FROM t2 GROUP BY t3_id) AS t_t2
ON t_t2.t3_id = t1.t3_id
GROUP BY t1.t3_id
ORDER BY t_t2.last
However, the execution time goes up to over a 1 minute.
I like to understand the reason
what is the cause of such a huge explosion?
NOTE:
ALL the used columns on every table have been indexed
e.g. :
table t1 has index on id,t3_Id
table t2 has index on t3_id and added
EDIT1
after #Tim Biegeleisen suggestion, I change the query to the following now the query is executing in about 16 seconds. If I remove the ORDER BY it query gets executed in less than 1 seconds. The problem is that ORDER BY the sole reason for this.
SELECT
t1.id, t1.t3_Id
FROM
t1
LEFT JOIN
t2 ON t2.t3_id = t1.t3_id
GROUP BY t1.t3_id
ORDER BY MAX(t2.added)
Even though table t2 has an index on column t3_id, when you join t1 you are actually joining to a derived table, which either can't use the index, or can't use it completely effectively. Since t1 has 16K rows and you are doing a LEFT JOIN, this means the database engine will need to scan the entire derived table for each record in t1.
You should use MySQL's EXPLAIN to see what the exact execution strategy is, but my suspicion is that the derived table is what is slowing you down.
The correct query should be:
SELECT
t1.id,
t1.t3_Id,
MAX(t2.added) as last
FROM t1
LEFT JOIN t2 on t1.t3_Id = t2.t3_Id
GROUP BY t2.t3_id
ORDER BY last;
This is happen because a temp table is generating on each record.
I think you could try to order everything after the records are available. Maybe:
select * from (
select * from
(select t3_id,max(t1_id) from t1 group by t3_id) as t1
left join (select t3_id,max(added) as last from t2 group by t3_id) as t2
on t1.t3_id = t2.t3_id ) as xx
order by last
I got a query inserting some data into a table. The query has a join to another table.
Will this other table be locked while the query is running?
-e-
here is a query like the one i'm using:
INSERT INTO table_1
SELECT t3.first_row,
t3.second_row
FROM table_2 t2
INNER JOIN
table_3 t3
ON t2.t3_fk = t3.id
WHERE t3.id IN (1, 2, 3, 4)
AND t2.created_at <= '2014-12-21 22:59:59'
The query is running in a rails transaction.
it will be locked while inserting. Insertion through transaction will be much better.
I found a solution. When setting transaction-isolation= to "READ-COMMITTED", read statements won't lock a table.
See: http://harrison-fisk.blogspot.ch/2009/02/my-favorite-new-feature-of-mysql-51.html
I would like to know whether this two versions are equivalent in result and which is better for performance reasons and why?
Nested Select in Select version
select
t1.c1,
t1.c2,
(select Count(t2.c1) from t2 where t2.id = t1.id) as count_t
from
t1
VS
select t1.c1,t1.c2, Count(t2.c1)
from t1,t2
where t2.id= t1.id
The first query is analog of this query -
SELECT
t1.c1,
t1.c2,
COUNT(t2.c1)
FROM t1
LEFT JOIN t2
ON t2.id = t1.id;
It selects all records from first table, and all matched records from second table (it is LEFT JOIN condition).
The second is analog of this query -
SELECT
t1.c1,
t1.c2,
COUNT(t2.c1)
FROM t1
JOIN t2
ON t2.id = t1.id;
It selects only matched records in both tables (it is INNER JOIN condition).
Well they are different queries. The top one will select all rows from t1 returning 0 for the count if there is no matching id in table t2.
The second query will only return rows where t1 and t2 both have a row with the same id.
The first query will likely suffer from performance issues on large data sets. The second query will potentially have a Cartesian issue. I would go with a join or left join based on your intent to have records from table 1 if table 2 has no related records and then add a group by statement to control the Cartesian.