SQL - Strange issue with SELECT

SQL - Strange issue with SELECT - mysql

I have a strange situation with a simple select by column pqth_scan_code from the following table:
table pqth_
Field Type Null Key Default Extra
pqth_id int(11) NO PRI NULL auto_increment
pqth_scan_code varchar(250) NO NULL
pqth_info text YES NULL
pqth_opk int(11) NO 999
query 1
This query took 12.7221 seconds to execute
SELECT * FROM `pqth_` WHERE pqth_scan_code = "7900722!30#3#6$EN"
query 2
This query took took 0.0003 seconds to execute
SELECT * FROM `pqth` WHERE `pqth_id`=27597
Based on data from table pqth_ I have created the following table, where pqthc_id = pqth_id and pqthc_scan_code=pqth_scan_code
table pqthc
Field Type Null Key Default Extra
pqthc_id int(11) NO PRI NULL
pqthc_scan_code tinytext NO NULL
The same query ,query1, on table pqthc took 0.0259 seconds to run
SELECT * FROM `pqthc` WHERE pqthc_scan_code = "7900722!30#3#6$EN"
If I run the following query will took 0.0971 seconds, very strange.
query 3
SELECT * FROM `pqth` WHERE pqth_id = (SELECT pqthc_id From pqthc where pqthc_scan_code = "7900722!30#3#6$EN")
My question is why a SELECT by pqth_scan_code is slow and SELECT by pqth_id is fastest? Both columns are indexed.
For testing please get the export from this link
The same behavior is with MySQL and MariaDB server

SELECT * FROM `pqth_` WHERE pqth_scan_code = "7900722!30#3#6$EN"
needs INDEX(pqth_scan_code). Period. End of discussion.
SELECT * FROM `pqth` WHERE `pqth_id`=27597
has a useful index, since a PRIMARY KEY is an index (and it is unique).
SELECT * FROM `pqthc` WHERE pqthc_scan_code = "7900722!30#3#6$EN"
also needs INDEX(pqthc_scan_code). But it may have been faster because (1) the table is smaller, or (2) you ran the query before, thereby caching what was needed in RAM.
Please don't prefix column names with the table name.
Please don't have table names so close to each other that they are hard to distinguish. (pqth and pqthc)
SELECT *
FROM `pqth`
WHERE pqth_id =
( SELECT pqthc_id
From pqthc
where pqthc_scan_code = "7900722!30#3#6$EN"
)
The construct IN ( SELECT ... ) is not efficient.
It is rare to have two table with the same PRIMARY KEY; are you sure you meant that?
Use a JOIN instead:
SELECT a.*
FROM `pqth` AS a
JOIN pqthc AS c ON a.id = c.id
where c.scan_code = "7900722!30#3#6$EN"
If that is 'correct', then I recommend this 'covering' index:
INDEX(scan_code, id)
instead of the shorter INDEX(scan_code) I previously recommended.
More on indexing.

you have to understand the concept of primary key and indexes and how they help in searching,
reference docs here

First of all pqthc_scan_code has no index/key and pqthc_id does, keys help making searches faster.
Another difference is that pqthc_id is an integer where as pqthc_scan_code is a string. comparing integers is a lot more efficient than comparing strings.
You should avoid having to search on strings in really large tables.
You could add a index/key to pqthc_scan_code but i don't know how much it will help.
You can use EXPLAIN in fronto of your query to try and figure out what takes so long More info on EXPLAIN

Related

MYSQL ERROR CODE: 1288 - can't update with join statement

Thanks for past help.
While doing an update using a join, I am getting the 'Error Code: 1288. The target table _____ of the UPDATE is not updatable' and figure out why. I can update the table with a simple update statement (UPDATE sales.customerABC Set contractID = 'x';) but can't using a join like this:
UPDATE (
SELECT * #where '*' contains columns a.uniqueID and a.contractID
FROM sales.customerABC
WHERE contractID IS NULL
) as a
LEFT JOIN (
SELECT uniqueID, contractID
FROM sales.tblCustomers
WHERE contractID IS NOT NULL
) as b
ON a.uniqueID = b.uniqueID
SET a.contractID = b.contractID;
If changing that update statement a SELECT such as:
SELECT * FROM (
SELECT *
FROM opwSales.dealerFilesCTS
WHERE pcrsContractID IS NULL
) as a
LEFT JOIN (
SELECT uniqueID, pcrsContractID
FROM opwSales.dealerFileLoad
WHERE pcrsContractID IS NOT NULL
) as b
ON a."Unique ID" = b.uniqueID;
the result table would contain these columns:
a.uniqueID, a.contractID, b.uniqueID, b.contractID
59682204, NULL, NULL, NULL
a3e8e81d, NULL, NULL, NULL
cfd1dbf9, NULL, NULL, NULL
5ece009c, , 5ece009c, B123
5ece0d04, , 5ece0d04, B456
5ece7ab0, , 5ece7ab0, B789
cfd21d2a, NULL, NULL, NULL
cfd22701, NULL, NULL, NULL
cfd23032, NULL, NULL, NULL
I pretty much have all database privileges and can't find restrictions with the table reference data. Can't find much information online concerning the error code, either.
Thanks in advance guys.

You cannot update a sub-select because it's not a "real" table - MySQL cannot easily determine how the sub-select assignment maps back to the originating table.
Try:
UPDATE customerABC
JOIN tblCustomers USING (uniqueID)
SET customerABC.contractID = tblCustomers.contractID
WHERE customerABC.contractID IS NULL AND tblCustomers.contractID IS NOT NULL
Notes:
you can use a full JOIN instead of a LEFT JOIN, since you want uniqueID to exist and not be null in both tables. A LEFT JOIN would generate extra NULL rows from tblCustomers, only to have them shot down by the clause requirement that tblCustomers.contractID be not NULL. Since they allow more stringent restrictions on indexes, JOINs tend to be more efficient than LEFT JOINs.
since the field has the same name in both tables you can replace ON (a.field1 = b.field1) with the USING (field1) shortcut.
you obviously strongly want a covering index with (uniqueID, customerID) on both tables to maximize efficiency
this is so not going to work unless you have "real" tables for the update. The "tblCustomers" may be a view or a subselect, but customerABC may not. You might need a more complicated JOIN to pull out a complex WHERE which might be otherwise hidden inside a subselect, if the original 'SELECT * FROM customerABC' was indeed a more complex query than a straight SELECT. What this boils down to is, MySQL needs a strong unique key to know what it needs to update, and it must be in a single table. To reliably update more than one table I think you need two UPDATEs inside a properly write-locked transaction.

What is the most efficient way to know if a MySQL longblob is empty?

I have a MySQL table or around 150,000 rows and a good half of them have a blob (image) stored in a longblob field. I'm trying to create a query to select rows and include a field that simply indicates that the longblob (image) is exists. Basically
select ID, address, IF(house_image != '', 1, 0) AS has_image from homes where userid='1234';
That query times out after 300 seconds. If I remove the 'IF(house_image != '', 1, 0)' it completes in less than a second. I've also tried the following, but they all time out.
IF(ISNULL(house_image),0,1) as has_image
LEFT (house_image,1) AS has_image
SUBSTRING(house_image,0,1) AS has_image
I am not a DBA (obviously), but I'm suspecting that the query is selecting the entire longblob to know if it's empty or null.
Is there an efficient way to know if a field is empty?
Thanks for any assistance.

I had similar problem long time ago and the workaround I ended up with was to move all blob/text columns into a separate table (bonus: this design allows multiple images per home). So once you've changed the design and moved the data around you could do this:
select id, address, (
select 1
from home_images
where home_images.home_id = homes.id
limit 1
) as has_image -- will be 1 or null
from homes
where userid = 1234
PS: I make no guarantees. Depending on storage engine and row format, the blobs could get stored inline. If that is the case then reading the data will take much more disk IO than needed even if you're not "select"ing the blob column.

It looks to me like you are treating the house_image column as a string when really you should be checking it for NULL.
select ID, address, IF(house_image IS NOT NULL, 1, 0) AS has_image
from homes where userid='1234';

LONGBLOBs can be indexed in MariaDB / MySQL, but the indexes are imperfect: they are so-called prefix indexes, and only consider the first bytes of the BLOB.
Try creating this compound index with a 20-byte prefix on your BLOB.
ALTER TABLE homes ADD INDEX user_image (userid, house_image(20));
Then this subquery will, efficiently, give you the IDs of rows with empty house_image columns.
SELECT ID
FROM homes
WHERE userid = '1234'
AND (house_image IS NULL OR house_image = '')
The prefix index can satisfy (house_image IS NULL OR house_image = '') directly without inspecting the BLOBs. That saves a whole mess of IO and CPU on your database server.
You can then incorporate your subquery into a main query to get your result.
SELECT h.ID, h.address,
CASE WHEN empty.ID IS NULL 1 ELSE 0 END has_image
FROM homes h
LEFT JOIN (
SELECT ID
FROM homes
WHERE userid = '1234'
AND (house_image IS NULL OR house_image = '')
) empty ON h.ID = empty.ID
WHERE h.userid = '1234'
The IS NULL ... LEFT JOIN trick means "any rows that do NOT show up in the subquery have images."

MySQL query with a subquery takes significantly longer when using a full text in a where, rather than an order by

I have a query which sometimes runs really fast and sometimes incredibly slowly depending on the number of results that match a full text boolean search within the query.
The query also contains a subquery.
Without the subquery the main query is always fast.
The subquery by itself is also always fast.
But together they are very slow.
Removing the full text search from a where clause and instead ordering by the full text search is really fast.
So it's only slow then when using a full text search search within a where.
That's the simple readable overview, exact queries are below.
I've included the schema at the bottom although it will be difficult to replicate without my dataset which unfortunately I can't share.
I've included the counts and increments in the example queries to give some indication of the data size involved.
I actually have a solution by simply accepting a result which includes irrelevant data and then filtering out that data in PHP. But i'd like to understand why my queries are performing poorly and how I might be able to resolve the issue in MySQL.
In particular i'm confused why it's fast with the full text search in an order by but not with it in the where.
The query I want (slow)
I've got a query that looks like this:
select
*,
MATCH (name) AGAINST ('Old Tra*' IN BOOLEAN MODE) AS relevance_score
from
`app_records`
where
`id` in (
select
distinct(app_record_parents.record_id)
from
`app_group_records`
inner join `app_record_parents`
on `app_record_parents`.`parent_id` = `app_group_records`.`record_id`
where
`group_id` = 3
)
and
MATCH (name) AGAINST ('Old Tra*' IN BOOLEAN MODE)
order by
`relevance_score` desc
limit
10;
This query takes 10 seconds.
This is too long for this sort of query, I need to be looking at milliseconds.
But the two queries run really fast when run by themselves.
The sub select by itself
select distinct(app_record_parents.record_id)
from
`app_group_records`
inner join
`app_record_parents`
on `app_record_parents`.`parent_id` = `app_group_records`.`record_id`
where
`group_id` = 3
The sub select by itself takes 7ms with 2600 results.
The main query without the sub select
select
*,
MATCH (name) AGAINST ('Old Tra*' IN BOOLEAN MODE) AS relevance_score
from
`app_records`
where
MATCH (name) AGAINST ('Old Tra*' IN BOOLEAN MODE)
order by
`relevance_score` desc
limit
10;
The main query without the sub select takes 6ms with 2971 possible results (obviously there's a limit 10 there).
It's faster with less results
The same query but matching against "Old Traf" rather than "Old Tra" takes 300ms.
The number of results are obviously different when using "Old Traf" vs "Old Tra".
Results of full query
"Old Tra": 9
"Old Traf": 2
Records matching the full text search
"Old Tra": 2971
"Old Traf": 120
Removing the where solves the issue
Removing the where and returning all records sorted by the relevance score is really fast and still gives me the experience i'd like:
select
*,
MATCH (name) AGAINST ('Old Tra*' IN BOOLEAN MODE) AS relevance_score
from
`app_records`
where
`id` in (
select
distinct(app_record_parents.record_id)
from
`app_group_records`
inner join `app_record_parents`
on `app_record_parents`.`parent_id` = `app_group_records`.`record_id`
where
`group_id` = 3
)
order by
`relevance_score` desc
limit
10;
But then I need to filter out irrelevant results in code
I'm using this in php so I can now filter my results to remove any that have a 0 relevance score (if there are only 2 matches for instance, 8 random results with a relevance score of 0 will still be included, since i'm not using a where).
array_filter($results, function($result) {
return $result->relevance_score > 0;
});
Obviously this is really quick so it's not really a problem.
But I still don't understand what's wrong with my queries.
So I do have a fix as outlined above. But I still don't understand why my queries are slow.
It's clear that the number of possible results from the full text search is causing an issue, but exactly why and how to get around this issue is beyond me.
Table Schema
Here are my tables
CREATE TABLE `app_records` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`type` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
FULLTEXT KEY `app_models_name_IDX` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=960004 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
CREATE TABLE `app_record_parents` (
`record_id` int(10) unsigned NOT NULL,
`parent_id` int(10) unsigned DEFAULT NULL,
KEY `app_record_parents_record_id_IDX` (`record_id`) USING BTREE,
KEY `app_record_parents_parent_id_IDX` (`parent_id`) USING BTREE,
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
CREATE TABLE `app_group_records` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`group_id` int(10) unsigned NOT NULL,
`record_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=31 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
A note on what the queries are doing
The subquery is getting a list of record_id's that belong to group_id 3.
So while there are 960004 records in app_records there are only 2600 which belong to group 3 and it is against these 2600 that i'm trying to query for name's that match "Old Tra",
So the subquery is getting a list of these 2600 record_id's and then i'm doing a WHERE id IN <subquery> to get the relevant results from app_records.
EDIT: Using joins is equally slow
Just to add using joins has the same issue. Taking 10 seconds for "Old Tra" and 400ms for "Old Traf" and being very fast when not using a full text search in a where.
SELECT
app_records.*,
MATCH (NAME) AGAINST ('Old Tra*' IN BOOLEAN MODE) AS relevance_score
FROM
`app_records`
INNER JOIN app_record_parents ON app_records.id = app_record_parents.record_id
INNER JOIN app_group_records ON app_group_records.record_id = app_record_parents.parent_id
WHERE
`group_id` = 3
AND MATCH (NAME) AGAINST ('Old Tra*' IN BOOLEAN MODE)
GROUP BY
app_records.id
LIMIT
10;

app_record_parents
Has no PRIMARY KEY; hence may have unnecessary duplicate pairs.
Does not have optimal indexes.
See this for several tips.
Perhaps app_group_records is also many-many?
Are you are searching for Old Tra* anywhere in name? If not, then why not use WHERE name LIKE 'Old Tra%. In this case, add INDEX(name).
Note: When FULLTEXT is involved, it is picked first. Please provide EXPLAIN SELECT to confirm this.
This formulation may be faster:
select *,
MATCH (r.name) AGAINST ('Old Tra*' IN BOOLEAN MODE) AS relevance_score
from `app_records` AS r
WHERE MATCH (r.name) AGAINST ('Old Tra*' IN BOOLEAN MODE)
AND EXISTS ( SELECT 1
FROM app_group_records AS gr
JOIN app_record_parents AS rp ON rp.parent_id = gr.record_id
WHERE gr.group_id = 3
AND r.id = rp.record_id )
ORDER BY relevance_score DESC
LIMIT 10
Indexes:
gr: (group_id, record_id) -- in this order
r: nothing but the FULLTEXT will be used
rp: (record_id, parent_id) -- in this order

Too many cartesian products, making query to run slower

Consider I have two tables DETAILS AND RATE with following columns:
DETAILS table:
CREATE TABLE DETAILS(
LONG ID PRIMARY KEY AUTO_INCREMENT,
DOUBLE PRICE1,
DOUBLE PRICE2,
DOUBLE PRICE3,
VARCHAR(25) CURRENCY,
DATE CREATED_DATE,
VARCHAT(50) COMPLETED
..................
Few more columns
);
RATE TABLE:
CREATE TABLE RATE(
LONG ID PRIMARY KEY AUTO_INCREMENT,
DOUBLE RATE,
VARCHAR(25) CURRENCY,
DATE CREATED_DATE
..................
Few more columns
);
And I have a update query for DETAILS table as shown bellow.
UPDATE DETAILS D, RATE R
SET D.PRICE1=D.PRICE1*R.RATE,
D.PRICE2=D.PRICE2*R.RATE,
D.PRICE3=D.PRICE3*R.RATE
WHERE
D.CURRENCY=R.CURRENCY AND
DATE(D.CREATED_DATE) = DATE(R.CREATED_DATE) AND
D.COMPLETED IS NULL OR DO.COMPLETED='ABC' AND
D.CURRENCY!='RUPEE';
Before the query was working fine but as table grown this query is started taking more time and it is giving cartesion product in terms of billions.
Is there any way I can optimise this query?
Any help will be greatly appriciated.

Use explicit joins and fix the where clause using parentheses:
UPDATE DETAILS D JOIN
RATE R
ON D.CURRENCY=R.CURRENCY AND
DATE(D.CREATED_DATE) = DATE(R.CREATED_DATE)
SET D.PRICE1 = D.PRICE1*R.RATE,
D.PRICE2 = D.PRICE2*R.RATE,
D.PRICE3 = D.PRICE3*R.RATE
WHERE (D.COMPLETED IS NULL OR DO.COMPLETED='ABC') AND
D.CURRENCY <> 'RUPEE';
The problem is the parentheses in the where clause. However, you simply should not use commas to mean join.

DATE(D.CREATED_DATE) = DATE(R.CREATED_DATE)
Since those fields are DATE datatype, there is no need to use the DATE() function. In fact, doing so prevents use of an index.
Add INDEX(currency, created_date) at least to RATE. This, plus the above change will greatly speed up the query.
Another improvement would be to make currency an ENUM or normalize it.

Simple MySQL query runs very slow

I found some strange(for me) behavour in MySQL. I have a simple query:
SELECT CONVERT( `text`.`old_text`
USING utf8 ) AS stext
FROM `text`
WHERE `text`.`old_id` IN
(
SELECT `revision`.`rev_text_id`
FROM `revision`
WHERE `revision`.`rev_id`
IN
(
SELECT `page_latest`
FROM `page`
WHERE `page_id` = 108
)
)
when i run it, phpmyadmin show execution time of 77.0446 seconds.
But then i replace
WHERE `text`.`old_id` IN
by
WHERE `text`.`old_id` =
it's execution time falls to about 0.001 sec. Result of this query
SELECT `revision`.`rev_text_id`
FROM `revision`
WHERE `revision`.`rev_id`
IN
(
SELECT `page_latest`
FROM `page`
WHERE `page_id` = 108
)
is
+------------+
|rev_text_id |
+------------+
|6506 |
+------------+
Can somebody please explain this behavour?

try to add INDEX on the following columns,
ALTER TABLE `text` ADD INDEX idx_text (old_id);
ALTER TABLE `revision` ADD INDEX idx_revision (rev_text_id);
and Execute the following query
SELECT DISTINCT CONVERT(a.`old_text` USING utf8 ) AS stext
FROM `text` a
INNER JOIN `revision` b
ON a.`old_id` = b.`rev_text_id`
INNER JOIN `page` c
ON b.`rev_id` = c.`page_latest`
WHERE c.`page_id` = 108
PS: Can you run also the following query and post their respective results?
DESC `text`;
DESC `revision`;
DESC `page`;

There are two primary ways you can increase your query performance here
Add Indexes (such as Kuya mentioned)
Rid yourself of the subqueries where possible
For Indexes, add an index on the columns you are searching for your matches:
text.old_id, revision.rev_text_id & page.page_id
ALTER TABLE `text` ADD INDEX idx_text (old_id);
ALTER TABLE `revision` ADD INDEX idx_revision (rev_text_id);
ALTER TABLE `page` ADD INDEX idx_page (page_id);
Your next issue is that nested-sub-selects are hell on your query execution plan. Here is a good thread discussing JOIN vs Subquery. Here is an article on how to get execution plan info from mySQL.
First looks at an execution plan can be confusing, but it will be your best friend when you have to concern yourself with query optimization.
Here is an example of your same query with just joins ( you could use inner or left and get pretty much the same result). I don't have your tables or data, so forgive synax issues (there is no way I can verify the code works verbatim in your environment, but it should give you a good starting point).
SELECT
CONVERT( `text`.`old_text` USING utf8 ) AS stext
FROM `text`
-- inner join only returns rows when it can find a
-- matching `revision`.`rev_text_id` row to `text`.`old_id`
INNER JOIN `revision`
ON `text`.`old_id` = `revision`.`rev_text_id`
-- Inner Join only returns rows when it can find a
-- matching `page_latest` row to `page_latest`
INNER JOIN `page`
ON `revision`.`rev_id` = `page`.`page_latest`
WHERE `page`.`page_id` = 108

MySQLDB is looping through each result of the inner query and comparing it with each record in the outer query.
in the second inner query;
WHERE `revision`.`rev_id`
IN
( SELECT `page_latest`
FROM `page`
WHERE `page_id` = 108
you should definitely use '=' instead of IN, since you're selecting a distinct record, there would be no point in looping through a result when you know only one record will be returned each time

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008