I found some strange(for me) behavour in MySQL. I have a simple query:
SELECT CONVERT( `text`.`old_text`
USING utf8 ) AS stext
FROM `text`
WHERE `text`.`old_id` IN
(
SELECT `revision`.`rev_text_id`
FROM `revision`
WHERE `revision`.`rev_id`
IN
(
SELECT `page_latest`
FROM `page`
WHERE `page_id` = 108
)
)
when i run it, phpmyadmin show execution time of 77.0446 seconds.
But then i replace
WHERE `text`.`old_id` IN
by
WHERE `text`.`old_id` =
it's execution time falls to about 0.001 sec. Result of this query
SELECT `revision`.`rev_text_id`
FROM `revision`
WHERE `revision`.`rev_id`
IN
(
SELECT `page_latest`
FROM `page`
WHERE `page_id` = 108
)
is
+------------+
|rev_text_id |
+------------+
|6506 |
+------------+
Can somebody please explain this behavour?
try to add INDEX on the following columns,
ALTER TABLE `text` ADD INDEX idx_text (old_id);
ALTER TABLE `revision` ADD INDEX idx_revision (rev_text_id);
and Execute the following query
SELECT DISTINCT CONVERT(a.`old_text` USING utf8 ) AS stext
FROM `text` a
INNER JOIN `revision` b
ON a.`old_id` = b.`rev_text_id`
INNER JOIN `page` c
ON b.`rev_id` = c.`page_latest`
WHERE c.`page_id` = 108
PS: Can you run also the following query and post their respective results?
DESC `text`;
DESC `revision`;
DESC `page`;
There are two primary ways you can increase your query performance here
Add Indexes (such as Kuya mentioned)
Rid yourself of the subqueries where possible
For Indexes, add an index on the columns you are searching for your matches:
text.old_id, revision.rev_text_id & page.page_id
ALTER TABLE `text` ADD INDEX idx_text (old_id);
ALTER TABLE `revision` ADD INDEX idx_revision (rev_text_id);
ALTER TABLE `page` ADD INDEX idx_page (page_id);
Your next issue is that nested-sub-selects are hell on your query execution plan. Here is a good thread discussing JOIN vs Subquery. Here is an article on how to get execution plan info from mySQL.
First looks at an execution plan can be confusing, but it will be your best friend when you have to concern yourself with query optimization.
Here is an example of your same query with just joins ( you could use inner or left and get pretty much the same result). I don't have your tables or data, so forgive synax issues (there is no way I can verify the code works verbatim in your environment, but it should give you a good starting point).
SELECT
CONVERT( `text`.`old_text` USING utf8 ) AS stext
FROM `text`
-- inner join only returns rows when it can find a
-- matching `revision`.`rev_text_id` row to `text`.`old_id`
INNER JOIN `revision`
ON `text`.`old_id` = `revision`.`rev_text_id`
-- Inner Join only returns rows when it can find a
-- matching `page_latest` row to `page_latest`
INNER JOIN `page`
ON `revision`.`rev_id` = `page`.`page_latest`
WHERE `page`.`page_id` = 108
MySQLDB is looping through each result of the inner query and comparing it with each record in the outer query.
in the second inner query;
WHERE `revision`.`rev_id`
IN
( SELECT `page_latest`
FROM `page`
WHERE `page_id` = 108
you should definitely use '=' instead of IN, since you're selecting a distinct record, there would be no point in looping through a result when you know only one record will be returned each time
Related
Thanks for past help.
While doing an update using a join, I am getting the 'Error Code: 1288. The target table _____ of the UPDATE is not updatable' and figure out why. I can update the table with a simple update statement (UPDATE sales.customerABC Set contractID = 'x';) but can't using a join like this:
UPDATE (
SELECT * #where '*' contains columns a.uniqueID and a.contractID
FROM sales.customerABC
WHERE contractID IS NULL
) as a
LEFT JOIN (
SELECT uniqueID, contractID
FROM sales.tblCustomers
WHERE contractID IS NOT NULL
) as b
ON a.uniqueID = b.uniqueID
SET a.contractID = b.contractID;
If changing that update statement a SELECT such as:
SELECT * FROM (
SELECT *
FROM opwSales.dealerFilesCTS
WHERE pcrsContractID IS NULL
) as a
LEFT JOIN (
SELECT uniqueID, pcrsContractID
FROM opwSales.dealerFileLoad
WHERE pcrsContractID IS NOT NULL
) as b
ON a."Unique ID" = b.uniqueID;
the result table would contain these columns:
a.uniqueID, a.contractID, b.uniqueID, b.contractID
59682204, NULL, NULL, NULL
a3e8e81d, NULL, NULL, NULL
cfd1dbf9, NULL, NULL, NULL
5ece009c, , 5ece009c, B123
5ece0d04, , 5ece0d04, B456
5ece7ab0, , 5ece7ab0, B789
cfd21d2a, NULL, NULL, NULL
cfd22701, NULL, NULL, NULL
cfd23032, NULL, NULL, NULL
I pretty much have all database privileges and can't find restrictions with the table reference data. Can't find much information online concerning the error code, either.
Thanks in advance guys.
You cannot update a sub-select because it's not a "real" table - MySQL cannot easily determine how the sub-select assignment maps back to the originating table.
Try:
UPDATE customerABC
JOIN tblCustomers USING (uniqueID)
SET customerABC.contractID = tblCustomers.contractID
WHERE customerABC.contractID IS NULL AND tblCustomers.contractID IS NOT NULL
Notes:
you can use a full JOIN instead of a LEFT JOIN, since you want uniqueID to exist and not be null in both tables. A LEFT JOIN would generate extra NULL rows from tblCustomers, only to have them shot down by the clause requirement that tblCustomers.contractID be not NULL. Since they allow more stringent restrictions on indexes, JOINs tend to be more efficient than LEFT JOINs.
since the field has the same name in both tables you can replace ON (a.field1 = b.field1) with the USING (field1) shortcut.
you obviously strongly want a covering index with (uniqueID, customerID) on both tables to maximize efficiency
this is so not going to work unless you have "real" tables for the update. The "tblCustomers" may be a view or a subselect, but customerABC may not. You might need a more complicated JOIN to pull out a complex WHERE which might be otherwise hidden inside a subselect, if the original 'SELECT * FROM customerABC' was indeed a more complex query than a straight SELECT. What this boils down to is, MySQL needs a strong unique key to know what it needs to update, and it must be in a single table. To reliably update more than one table I think you need two UPDATEs inside a properly write-locked transaction.
I have a MySQL table that looks (very simplified) like this:
CREATE TABLE `logging` (
`id` bigint(20) NOT NULL,
`time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`level` smallint(3) NOT NULL,
`message` longtext CHARACTER SET utf8 COLLATE utf8_general_mysql500_ci NOT NULL
);
I would like to delete all rows of a specific level, except the last one (time is most recent).
Is there a way to select all rows with level set to a specific value and then delete all rows except the latest one in one single SQL query? How would I start solving this problem?
(As I said, this is a very simplified table, so please don't try to discuss possible design problems of this table. I removed some columns. It is designed per PSR-3 logging standard and I don't think there is an easy way to change that. What I want to solve is how I can select from a table and then delete all but some rows of the same table. I have only intermediate knowledge of MySQL.)
Thank you for pushing me in the right direction :)
Edit:
The Database version is /usr/sbin/mysqld Ver 8.0.18-0ubuntu0.19.10.1 for Linux on x86_64 ((Ubuntu))
You can use ROW_NUMBER() analytic function ( as using DB version 8+ ) :
DELETE lg FROM `logging` AS lg
WHERE lg.`id` IN
( SELECT t.`id`
FROM
(
SELECT t.*,
ROW_NUMBER() OVER (ORDER BY `time` DESC) as rn
FROM `logging` t
-- WHERE `level` = #lvl -- optionally add this line to restrict for a spesific value of `level`
) t
WHERE t.rn > 1
)
to delete all of the rows except the last inserted one(considering id is your primary key column).
You can do this:
SELECT COUNT(time) FROM logging WHERE level=some_level INTO #TIME_COUNT;
SET #TIME_COUNT = #TIME_COUNT-1;
PREPARE STMT FROM 'DELETE FROM logging WHERE level=some_level ORDER BY time ASC LIMIT ?;';
EXECUTE STMT USING #TIME_COUNT;
If you have an AUTO_INCREMENT id column - I would use it to determine the most recent entry. Here is one way doing that:
delete l
from (
select l1.level, max(id) as id
from logging l1
where l1.level = #level
) m
join logging l
on l.level = m.level
and l.id < m.id
An index on (level) should give you good performance and will support the MAX() subquery as well as the JOIN.
View on DB Fiddle
If you really need to use the time column, you can modify the query as follows:
delete l
from (
select l1.level, l1.id
from logging l1
where l1.level = #level
order by l1.time desc, l1.id desc
limit 1
) m
join logging l
on l.level = m.level
and l.id <> m.id
View on DB Fiddle
Here you would want to have an index on (level, time).
I've got 2 mysql 5.7 databases hosted on the same server (we're migrating from 1 structure to another)
I want to delete all the rows from database1.table_x where the there is a corresponding row in database2.table_y
The column which contains the data to match on is called code
I'm able to do a SELECT which returns everything that is expected - this is effectively the set of data I want to delete.
An example select would be:
SELECT *
FROM `database1`.`table_x`
WHERE `code` NOT IN (SELECT `code`
FROM `database2`.`table_y`);
This works and it returns 5 rows within 138ms.
--
However, If I change the SELECT to a DELETE e.g.
DELETE
FROM `database1`.`table_x`
WHERE `code` NOT IN (SELECT `code`
FROM `database2`.`table_y`);
The query seems to hang - there are no errors returned, so I have to manually cancel the query after about 3 minutes.
--
Could anyone advise the most efficient/fastest way to achieve this?
try like below it will work
DELETE FROM table_a WHERE `code` NOT IN (
select * from
(
SELECT `code` FROM `second_database`.`table_b`
) as t
);
Try the following query:
DELETE a
FROM first_database.table_a AS a
LEFT JOIN second_database.table_b AS b ON b.code = a.code
WHERE b.code IS NULL;
I have a strange situation with a simple select by column pqth_scan_code from the following table:
table pqth_
Field Type Null Key Default Extra
pqth_id int(11) NO PRI NULL auto_increment
pqth_scan_code varchar(250) NO NULL
pqth_info text YES NULL
pqth_opk int(11) NO 999
query 1
This query took 12.7221 seconds to execute
SELECT * FROM `pqth_` WHERE pqth_scan_code = "7900722!30#3#6$EN"
query 2
This query took took 0.0003 seconds to execute
SELECT * FROM `pqth` WHERE `pqth_id`=27597
Based on data from table pqth_ I have created the following table, where pqthc_id = pqth_id and pqthc_scan_code=pqth_scan_code
table pqthc
Field Type Null Key Default Extra
pqthc_id int(11) NO PRI NULL
pqthc_scan_code tinytext NO NULL
The same query ,query1, on table pqthc took 0.0259 seconds to run
SELECT * FROM `pqthc` WHERE pqthc_scan_code = "7900722!30#3#6$EN"
If I run the following query will took 0.0971 seconds, very strange.
query 3
SELECT * FROM `pqth` WHERE pqth_id = (SELECT pqthc_id From pqthc where pqthc_scan_code = "7900722!30#3#6$EN")
My question is why a SELECT by pqth_scan_code is slow and SELECT by pqth_id is fastest? Both columns are indexed.
For testing please get the export from this link
The same behavior is with MySQL and MariaDB server
SELECT * FROM `pqth_` WHERE pqth_scan_code = "7900722!30#3#6$EN"
needs INDEX(pqth_scan_code). Period. End of discussion.
SELECT * FROM `pqth` WHERE `pqth_id`=27597
has a useful index, since a PRIMARY KEY is an index (and it is unique).
SELECT * FROM `pqthc` WHERE pqthc_scan_code = "7900722!30#3#6$EN"
also needs INDEX(pqthc_scan_code). But it may have been faster because (1) the table is smaller, or (2) you ran the query before, thereby caching what was needed in RAM.
Please don't prefix column names with the table name.
Please don't have table names so close to each other that they are hard to distinguish. (pqth and pqthc)
SELECT *
FROM `pqth`
WHERE pqth_id =
( SELECT pqthc_id
From pqthc
where pqthc_scan_code = "7900722!30#3#6$EN"
)
The construct IN ( SELECT ... ) is not efficient.
It is rare to have two table with the same PRIMARY KEY; are you sure you meant that?
Use a JOIN instead:
SELECT a.*
FROM `pqth` AS a
JOIN pqthc AS c ON a.id = c.id
where c.scan_code = "7900722!30#3#6$EN"
If that is 'correct', then I recommend this 'covering' index:
INDEX(scan_code, id)
instead of the shorter INDEX(scan_code) I previously recommended.
More on indexing.
you have to understand the concept of primary key and indexes and how they help in searching,
reference docs here
First of all pqthc_scan_code has no index/key and pqthc_id does, keys help making searches faster.
Another difference is that pqthc_id is an integer where as pqthc_scan_code is a string. comparing integers is a lot more efficient than comparing strings.
You should avoid having to search on strings in really large tables.
You could add a index/key to pqthc_scan_code but i don't know how much it will help.
You can use EXPLAIN in fronto of your query to try and figure out what takes so long More info on EXPLAIN
I have a view : vcompanyendofday
The following query executes in just 0.7 secs
Select * from vcompanyendofday
But a simple where condition to this query takes around 200.0 secs
select * from vcompanyendofday where companyid <= 51;
This is the view definition:
CREATE VIEW `vcompanyendofday` AS
select `c`.`companyid` AS `companyid`,
`c`.`scripcode` AS `scripcode`,
`e`.`eoddate` AS `eoddate`,
`e`.`prevclose` AS `prevclose`,
`e`.`delqty` AS `delqty`
from (
`company` `c`
left join
`endofday` `e`
on ((`c`.`companyid` = `e`.`companyid`)))
where (`e`.`eoddate` =
(
select max(`e2`.`eoddate`) AS `max(eoddate)`
from `endofday` `e2`
where (`e2`.`companyid` = `c`.`companyid`)
)
);
Seems you don't have an index on endofday.companyid
When you add the condition, company becomes leading in the join, and kills all performance.
Create an index on endofday.companyid:
CREATE INDEX ix_endofday_companyid ON endofday(companyid)
By the way, if you want all companies to be returned, you need to put the subquery into the ON clause of the OUTER JOIN, or your missing endofday's will be filtered out:
CREATE VIEW `vcompanyendofday` AS
select `c`.`companyid` AS `companyid`,
`c`.`scripcode` AS `scripcode`,
`e`.`eoddate` AS `eoddate`,
`e`.`prevclose` AS `prevclose`,
`e`.`delqty` AS `delqty`
from (
`company` `c`
left join
`endofday` `e`
on `c`.`companyid` = `e`.`companyid`
AND `e`.`eoddate` =
(
select max(`e2`.`eoddate`) AS `max(eoddate)`
from `endofday` `e2`
where (`e2`.`companyid` = `c`.`companyid`)
)
Have you tried the select used to create the view by itself with the WHERE clause to see what happens?
If the problem happens with that, run EXPLAIN on that query to see what's happening.
At a guess, there's no index on companyid in one of the tables, most likely endofday.