I have a 80000 row database with a result number between 130000000 and 168000000, the results are paired using field pid. I need to change the status of the rows from 'G' to 'X' where the result pair has a difference of 4300000.
I have come up with the query below, which works but is very slow, can it be improved for speed?
UPDATE table1 SET status = 'X'
WHERE id IN (
SELECT id FROM (
SELECT a.id AS id FROM table1 a, table1 b
WHERE a.result = b.result + 4300000
AND a.pid = b.pid
AND a.result between 130000000 and 168000000
AND a.status = 'G'
) AS c
);
The indexes are:-
table1 0 PRIMARY 1 id A 80233 NULL NULL BTREE
table1 1 id 1 id A 80233 NULL NULL BTREE
table1 1 id 2 result A 80233 NULL NULL BTREE
table1 1 id 3 status A 80233 4 NULL YES BTREE
table1 1 id 4 name A 80233 32 NULL BTREE
table1 1 id 5 pid A 80233 16 NULL BTREE
Using a subquery inside the IN(..) clause is generally inefficient in MySQL. Instead, you can rewrite the Update query utilizing UPDATE .. JOIN syntax and utilize "self-join" as well:
UPDATE table1 AS a
JOIN table1 AS b
ON b.pid = a.pid
AND b.result = a.result - 4300000
SET a.status = 'X'
WHERE a.result between 130000000 and 168000000
AND a.status = 'G'
For good performance (and if I understand NLJ (Nested-Loop-Join) correctly), you would need two indexes: (status,result) and (pid).
First (composite) index will be used to consider rows from the table alias a. Since we have range condition on result, it will be better to define status first, otherwise MySQL would simply stop at the result field in the index (if defined first), due to range condition.
Second index will be used for lookups in the Joined table alias b, using NLJ algorithm.
Related
The table table1 contains 1500000 rows and contains 80 fields and i want to remove the duplicates based on field1 and field2 and ID field is unique so i used the maximum option.
Option1: Insert Option
insert into table2_unique
select * from table1 a
where a.id = ( select max(b.id) from table1 b
where a.field1 = b.field1
and a.field2 = b.field2 );
But the query fails because of the below error.
Error Code: 1206. The total number of locks exceeds the lock table size
Explain Statement:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 INSERT table2 NULL ALL NULL NULL NULL NULL NULL NULL NULL
1 PRIMARY a NULL ALL NULL NULL NULL NULL 1387764 100 Using where
2 DEPENDENT SUBQUERY b NULL ref field1x,field2x field1x 39 a.field1 537 10 Using where
Option2 DELETE Statement:
DELETE n1 FROM table1 n1, table1 n2 WHERE n1.id > n2.id AND n1.field1 = n2.field1 and n1.field2 and n2.field2
When i execute then Deadlock occured.
I am not able to increase the buffer pool size, please let me know shall i write the query in different way.
Increased the INNODB_BUFFER_POOL_SIZE in my.ini file and the query ran in 27 mins for that specified volume
I'm not sure how it will impact the locks but using dependant sub-queries (i.e. pushed predicates) in mysql has never worked very well in my experience. I would have written the first query as:
insert into table2_unique (id, col1, col2, ...col79)
select a.id, a.col1, a.col2, ...a.col79
from table1 a
Inner join (
Select max(b.id) as id
From table1 b
Group by b.col1, b.col2
) As dedup
On a.id=dedup.id;
Trying to update a table using a join is always a bit dodgy. When its a self-join, then its not surprising it fails. Using a temporary table and splitting the operation into 2 steps avoids this.
I need to select all records from a table containing id's that are not "checked" in four other tables. Here's my query, which works:
select id, idDate, idInfo, idChecked from aTable
where id not in (select id from aSimilarTable1 where idChecked is not null)
and id not in (select id from aSimilarTable2 where idChecked is not null)
and id not in (select id from aSimilarTable3 where idChecked is not null)
and id not in (select id from aSimilarTable4 where idChecked is not null)
The tables grow over time, and now this query takes a very long time to run (several minutes, at best). The size of the tables are the following:
aTable - 1000 records
aSimilarTable1, 2, 3, 4 - 50,000 records
I will work on reducing the size of the tables. However, is there a more efficient way to make the above query?
--CLARIFICATION--
Not all id's from aTable may be present in aSimilarTable1,2,3 or 4. I am looking for ids in aTable that are either not present in any aSimilarTable, or if present, are not "checked".
--UPDATE--
Explain plan for the query:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY aTable ALL null null null null 796 Using where
5 DEPENDENT SUBQUERY aSimilarTable4 ALL null null null null 21217 Using where
4 DEPENDENT SUBQUERY aSimilarTable3 ALL null null null null 59077 Using where
3 DEPENDENT SUBQUERY aSimilarTable2 ALL null null null null 22936 Using where
2 DEPENDENT SUBQUERY aSimilarTable1 ALL null null null null 49734 Using where
Use LEFT JOIN's.
SELECT a.id, a.idDate, a.idInfo, a.idChecked
FROM aTable a
LEFT JOIN aSimilarTable1 b ON a.id = b.id
LEFT JOIN aSimilarTable2 c ON a.id = c.id
LEFT JOIN aSimilarTable3 d ON a.id = d.id
LEFT JOIN aSimilarTable4 e ON a.id = e.id
I have the following query:
SELECT *
FROM s
JOIN b ON s.borrowerId = b.id
JOIN (
SELECT MIN(id) AS id
FROM tbl
WHERE dealId IS NULL
GROUP BY borrowerId, created
) s2 ON s.id = s2.id
Is there a simple way to optimize this so that I can do the JOIN directly and utilize indexes?
UPDATE
The created field is part of the GROUP BY statement because due to the limitations of our version of MySQL and the ORM being used it is possible to have multiple records with the same created timestamp value. As a result I need to find the first record for each combination of borrowerId and created.
Typically I might attempt something like this:
SELECT *
FROM s
INNER JOIN b ON s.borrowerId = b.id
LEFT OUTER JOIN s2
ON s.borrowerId = s2.borrowerId
AND s.created = s2.created
AND s.id <> s2.id
AND s.id < s2.id
WHERE s2.id IS NULL
AND s.dealId IS NULL;
But I'm not sure if that works 100% the way I want.
EXPLAIN from MySQL outputs the following:
1 PRIMARY b ALL NULL NULL NULL NULL 129690
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 317751 Using join buffer
1 PRIMARY s eq_ref PRIMARY,borrowerId_2,borrowerId PRIMARY 4 s2.id 1 Using where
2 DERIVED statuses ref dealId dealId 5 183987 Using where; Using temporary; Using filesort
As you can see, it has to query a massive number of records to build the subquery data set and when joining to the derived subquery, no indexes are found and so no indexes are used.
The first query needs this composite index:
INDEX(borrowerId, created, id)
Note that MySQL rarely uses two indexes for one SELECT, but a composite index is often very handy.
The second query seems grossly inefficient.
Please provide SHOW CREATE TABLE for each table.
I hold a set of nodes in one mysql table1 and a table of edges in another one (table2). Nodes come with primary keys and edges use this "foreign key"
**table1**
id label
1 node1
2 node2
3 node3
**table2**
FK_first FK_sec rel
1 3 guardian
2 1 guardian
1 3 times
I know the db-design is not perfect, but its simple...
Now i want the number of 'rel' for every node and do a query like:
SELECT
label,
COUNT( rel ) as freq
FROM
`table1`
LEFT JOIN table2 ON (id=FK_first OR id=FK_second)
GROUP BY label
ORDER BY freq DESC
I have about 1000 nodes and 2000 edges. A query with ON (id=FK_first OR id=FK_second), then the query is way faster (<1 sec). The other query needs about 6 sec which is ver slow.
I would appreciate some comments to speed this up a bit :-)
LEFT JOIN table2 ON (id=FK_first OR id=FK_second) ~6 sec
LEFT JOIN table2 ON (id=FK_first) ~0.16 sec
LEFT JOIN table2 ON (id=FK_second) ~0.16 sec
LEFT JOIN table2 ON id IN (FK_first,FK_second) ~6 sec
EXPLAIN 1:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table1 ALL NULL NULL NULL NULL 2571 Using temporary; Using filesort
1 SIMPLE table2 ALL FK_first,FK_second,FK_first_2 NULL NULL NULL 3858
EXPLAIN 2:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table1 index NULL PRIMARY 2 NULL 2571 Using index; Using temporary; Using filesort
1 SIMPLE table2 ref FK_first,FK_first_2 FK_first_2 4 table1.id 1
Try doing two joins and moving the "OR" into the COUNT() function:
For every row, this joins table2 once on FK1, then again on FK2 (if it is not already joined to that row via FK1. Then in the COUNT, we specify that only rows which have either join's rel column as non-null.
SELECT
label,
COUNT( table2A.rel || table2B.rel ) as freq
FROM
`table1`
LEFT JOIN
table2 as table2A
ON id=table2A.FK_first
LEFT JOIN
table2 as table2B
ON id=table2B.FK_second
AND table2A.FKFirst != table2B.FKFirst
GROUP BY label
ORDER BY freq DESC
I have 2 tables
Table 1 tbl1
ID | Name
1 | stack
Table 2 tbl2 (empty table)
ID | Name
I have this query
SELECT id FROM tbl1 WHERE id != (SELECT id FROM tbl2)
My subquery returns null which means the WHERE comparison is id != null and since my id = 1, shouldn't it display the id 1?
However I keep getting zero or no rows returns. Why is that?
I really don't know, but have you tried SELECT id FROM tbl1 WHERE id NOT IN (SELECT id FROM tbl2)
comparision to null will always result in unknown
if you want to see if something is null you have to use the is operator.
In order to get the desired result, try using:
SELECT id FROM tbl1 WHERE id NOT IN (SELECT id FROM tbl2);
As your initial query will be valid only when tbl2 contains precisely 1 record.
NULL is a special value, you should use value IS NULL or value IS NOT NULL when checking for it.
Subselects have the potential to be very expensive, try using this left join instead
SELECT tbl1.id FROM tbl1 LEFT JOIN tbl2 ON tbl2.id = tbl1.id WHERE tbl2.id IS NULL