MySQL query usually fast, sometimes very slow (when run on my website) - mysql

I have a query on my MySQL database that is used to return products (on an e-commerce site) after the user performs a search in a free-form text box.
Until recently, user searches have been running fast. However, in the last few days, the searches are periodically very slow. This occurs for about 3 or 4 hours (spread randomly throughout the day) each day.
I had thought that there was a problem with my server. But now I have moved to another server, and the same thing still happens.
I suspect that the query I use is quite inefficient, and maybe this is the cause. But I don't understand why usually the query can run fast, and at other times be very slow. I would assume that an inefficient query would always be slow.
The query runs on two tables. If the search is for "blue jeans", then the query would look like this:
SELECT I.itemURL, I.itemTitle, I.itemPrice, I.itemReduced, I.itemFileName, I.itemBrand, I.itemStore, I.itemID, I.itemColour, I.itemSizes, I.itemBrandEn
FROM Item AS I, Search AS S
WHERE I.itemID = S.itemID
AND (S.searchStringTEXT LIKE '% blue %' OR S.searchStringTEXT LIKE 'blue %' OR S.searchStringTEXT LIKE '% blue')
AND (S.searchStringTEXT LIKE '% jeans %' OR S.searchStringTEXT LIKE 'jeans %' OR S.searchStringTEXT LIKE '% jeans')
Item is the table containing all products on the site. It has around 100,000 rows.
Search is a table containing product ids, and the tags associated to each product id. The tags are in the column "searchStringTEXT", and are separated by spaces. E.g., an entry in this column may be something like "jeans blue calvin klein small".
The search above will find all items that have both the tag "jeans" and "blue" attached.
In theory, Search should have the same number of rows as Item; but, due to a problem that I haven't fixed yet, it has about 500 fewer rows, so these items are effectively excluded from searches.
The create table details for both tables are as follows:
CREATE TABLE `Search` (
`itemID` int(11) NOT NULL,
`searchStringTEXT` varchar(255) DEFAULT NULL,
`searchStringVARCHAR` varchar(1000) DEFAULT NULL,
PRIMARY KEY (`itemID`),
KEY `indexSearch_837` (`itemID`) USING BTREE,
KEY `indexSearch_837_text` (`searchStringTEXT`)
) ENGINE=InnoDB DEFAULT CHARSET=latin5
and
CREATE TABLE `Item_8372` (
`itemID` int(11) NOT NULL AUTO_INCREMENT,
`itemURL` varchar(2000) DEFAULT NULL,
`itemTitle` varchar(500) DEFAULT NULL,
`itemFileName` varchar(200) DEFAULT NULL,
`itemPictureURL` varchar(2000) DEFAULT NULL,
`itemReduced` int(11) DEFAULT NULL,
`itemPrice` int(11) DEFAULT NULL,
`itemStore` varchar(500) DEFAULT NULL,
`itemBrand` varchar(500) CHARACTER SET latin1 DEFAULT NULL,
`itemShopCat` varchar(500) DEFAULT NULL,
`itemTimestamp` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`itemCat` varchar(200) DEFAULT NULL,
`itemSubCat` varchar(200) DEFAULT NULL,
`itemSubSubCat` varchar(200) DEFAULT NULL,
`itemSubSubSubCat` varchar(200) DEFAULT NULL,
`itemColour` varchar(200) DEFAULT NULL,
`itemSizes` varchar(200) DEFAULT NULL,
`itemBrandEn` varchar(500) DEFAULT NULL,
`itemReduction` float DEFAULT NULL,
`ItemPopularity` int(6) DEFAULT NULL,
PRIMARY KEY (`itemID`),
KEY `indexItem_8372_ts` (`itemTimestamp`) USING BTREE,
KEY `indexItem_8372_pop` (`ItemPopularity`),
KEY `indexItem_8372_red` (`itemReduction`),
KEY `indexItem_8372_price` (`itemReduced`)
) ENGINE=InnoDB AUTO_INCREMENT=970846 DEFAULT CHARSET=latin5
In the title of the question I say "(when run on my website)", because I find that the query's speed is consistent when run locally. But maybe this is just because I haven't tested it as much locally.
I'm thinking of changing the Search table so it's a MyISAM table, and then I can use a full text search instead of "LIKE". But I'd still like to figure out why I am experiencing what I am experiencing with the current setup.
Any ideas/suggestions much appreciated,
Andrew
Edit:
Here is the EXPLAIN result for the SELECT statement:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE I ALL PRIMARY NULL NULL NULL 81558
1 SIMPLE S eq_ref PRIMARY,indexSearch_837,indexSearch_837_text PRIMARY 4 I.itemID 1 Using where

Using LIKE will result in slow queries, especially the amount of times you have it. This is because LIKE scans all the rows in the table, which even with a few hundred can cause a problem (add to that the number of times you used LIKE with different variations of text to match with).
It only takes a few people simultaneously loading a page that runs this query to really slow your site down. Furthermore, the query may be running multiple times on your server in background threads from earlier, causing the longer-term slowdowns you're seeing.
If you're going to be performing text searches regularly, consider a RAM-based indexing solution for your search such as Sphinx. You could perform the text search there (it will be very fast compared to MySQL) and then retrieve the needed rows from the MySQL tables after.

You might want to try using REGEXP (reference). It's also better practice to stop using implicit joins.
SELECT I.itemurl,
I.itemtitle,
I.itemprice,
I.itemreduced,
I.itemfilename,
I.itembrand,
I.itemstore,
I.itemid,
I.itemcolour,
I.itemsizes,
I.itembranden
FROM item AS I
INNER JOIN search AS S
ON I.itemid = S.itemid
WHERE S.searchstringtext REGEXP ' blue | blue|blue '
AND S.searchstringtext REGEXP ' jeans | jeans|jeans '

Related

How to resolve update lock issue in MySQL

I have 2 MySQL UPDATE Query problem on my website.
Problem 1
I run a content site that updates page views for posts when users read.
Each time I send push notifications, my server times out; when I comment on the Update query that increments the page views, everything returns to normal.
This I think maybe as a result of hundreds of UPDATE query trying to update the views on the same row.
**The query that updated the tablename**
update table set views='$newview' where id=1
Query Explain
id: 1
select_type: SIMPLE
table: new_jobs
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
rows: 1
Extra: Using where
**tablename create table**
CREATE TABLE `tablename` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`company_id` int(11) DEFAULT NULL,
`job_title` varchar(255) DEFAULT NULL,
`slug` varchar(255) DEFAULT NULL,
`advert_date` date DEFAULT NULL,
`expiry_date` date DEFAULT NULL,
`no_deadline` int(1) DEFAULT 0,
`source` varchar(20) DEFAULT NULL,
`featured` int(1) DEFAULT 0,
`views` int(11) DEFAULT 1,
`email_status` int(1) DEFAULT 0,
`draft` int(1) DEFAULT 0,
`created_by` int(11) DEFAULT NULL,
`show_company_name` int(1) DEFAULT 1,
`display_application_method` int(1) DEFAULT 0,
`status` int(1) DEFAULT 1,
`upload_date` datetime DEFAULT NULL,
`country` int(1) DEFAULT 1,
`followers_email_status` int(1) DEFAULT 0,
`og_img` varchar(255) DEFAULT NULL,
`old_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `new_jobs_company_id_index` (`company_id`),
KEY `new_jobs_views_index` (`views`),
KEY `new_jobs_draft_index` (`draft`),
KEY `new_jobs_country_index` (`country`)
) ENGINE=InnoDB AUTO_INCREMENT=151359 DEFAULT CHARSET=utf8
What is the best way of handling this?
[Scenario 2 removed on request]
Scenario 1. I would expect the update of a 'view' count (or 'click' or 'like' or whatever) to be more like
UPDATE t SET views = views + 1 WHERE id = 123;
I assume you have an index (probably the PRIMARY KEY) on id?
Since there are other things going on with that table, it may be wise to split off the rapidly incrementing counter into a separate table. This would avoid interfering with other queries. You can get other data, plus the counter, by using JOIN .. USING(id).
Scenario 2 does not make sense. It seems to keep the latest date for each email, but what does country mean? Since it seems like more than just a counter, you might want a separate table to log those 3 columns.
Please provide SHOW CREATE TABLE.
There are many things that novices perceive as a "crash". Please describe further -- out of connections, out of disk space, sluggishness, the client gave error message, other operations taking too long, etc. Each has a different remedy.
Query
Are you are currently logically doing
BEGIN;
$ct = SELECT views ... FOR UPDATE;
...
UPDATE ... SET views = $ct+1 WHERE ...;
COMMIT;
If so, that is much less efficient than
(with autocommit = ON)
UPDATE ... SET views = views+1 ...;
Note that the first version hangs onto the row longer. If you fail to use FOR UPDATE, you will drop some counts.
Splitting into a separate table sort of forces you to run the UPDATE as its own transaction.
Other
innodb_flush_log_at_trx_commit:
Default is 1, which is secure, but leads to at least one IOPs for each transaction.
2 leads to a flush once a second. During intense times, this is much more efficient. But a crash could lose up to one second's worth of updates. the inaccuracy of "view count" due to a rare crash is, in my opinion, acceptable.
KEY(views) needs to be updated every time views is changed. But, thanks to the "change buffer", this is unlikely to involve any extra I/O, at least now while you are doing the UPDATE.
INT(1) takes 4 bytes; the (1) has no meaning. Suggest changing to TINYINT (1 byte), thereby saving about 27 bytes per row. (7 columns plus 2 indexes)
country INT(1) -- Is it a flag? What is the meaning? Is it normalized to another table? Using 4 bytes for an id and an extra table when standard abbreviations ('US', 'UK', 'RU', 'IN', etc) would take 2 bytes? Suggest country CHAR(2) CHARACTER SET ascii COLLATE ascii_general_ci.
Indexing flags rarely benefits. Let's see the queries where you think such indexes might be used. And the EXPLAIN SELECT ... for them.

Faster way to match a string in MySQL using replace

I have an interesting problem trying to select rows from a table where there are multiple possibilities for a VARCHAR column in my where clause.
Here's my table (which has around 7 million rows):
CREATE TABLE `search_upload_detailed_results` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`surId` bigint(20) DEFAULT NULL,
`company` varchar(100) DEFAULT NULL,
`country` varchar(45) DEFAULT NULL,
`clei` varchar(100) DEFAULT NULL,
`partNumber` varchar(100) DEFAULT NULL,
`mfg` varchar(100) DEFAULT NULL,
`cond` varchar(45) DEFAULT NULL,
`price` float DEFAULT NULL,
`qty` int(11) DEFAULT NULL,
`age` int(11) DEFAULT NULL,
`description` varchar(500) DEFAULT NULL,
`status` varchar(45) DEFAULT NULL,
`fileId` bigint(20) DEFAULT NULL,
`nmId` bigint(20) DEFAULT NULL,
`quoteRequested` tinyint(1) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `sudr.surId` (`surId`),
KEY `surd.clei` (`clei`),
KEY `surd.pn` (`partNumber`),
KEY `surd.fileId` (`fileId`),
KEY `surd.price` (`price`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
I'm trying to match on the partNumber column. The problem is that the partNumber is in different formts, and can be entered in the search form in multiple formats.
Example: Part Number '300-1231-932' could be:
300-1231-932
3001231932
300 1231 932
A simple select like this takes 0.0008 seconds.
select avg(price) as price from search_upload_detailed_results where
partNumber LIKE '3001231932%' and price > 0;
But it doesn't give me all of the matches that I need. So I wrote this query.
select avg(price) as price from search_upload_detailed_results
where REPLACE(REPLACE(partNumber,'-',''),' ','') LIKE REPLACE(REPLACE('3001231932%','-',''),' ','') and price > 0;
This gives me all of the correct matches, but it's super slow at 3.3 seconds.
I played around with some things, trying to reduce the number of rows I'm doing the replace on, and came up with this.
select avg(price) as price from search_upload_detailed_results
where price > 0 AND
partNumber LIKE('300%') AND
REPLACE(REPLACE(partNumber,'-',''),' ','') LIKE REPLACE(REPLACE('3001231932%','-',''),' ','');
It takes 0.4 seconds to execute. Pretty fast, but could still be a bit time consuming in a multi-part search.
I would like to get it a little faster, but this is as far as I could get. Are there any other ways to optimize this query?
UPDATE to show explain for the 3rd query:
# id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
1, SIMPLE, search_upload_detailed_results, range, surd.pn,surd.price, surd.pn, 103, , 89670, Using where
The obvious solution is to just store the part number with no extra characters in the table. Then remove these characters from the user input, and just do a simple WHERE partnumber = #input query.
If that's not possible, you can add that as an additional column. In MySQL 5.7 you can use a generated column; in earlier versions you can use a trigger that fills in this column.
I would like to get it a little faster, but this is as far as I could get. Are there any other ways to optimize this query?
As Barmar has said, the best solution if you really need speed (is 3.3s slow?) is to have a column with the untransformed data in it (hopefully now standardised), that'll allow you to query it without specifying all the different types of part numbers.
Example: Part Number '300-1231-932' could be:
300-1231-932 ||
3001231932 ||
300 1231 932
I think you should worry about the presentation of your data, having all those different 'formats' will make it difficult - can you format to one standard (before it reaches the DB)?
Here's my table (which has around 7 million rows):
Don't forget your index!
As mentioned elsewhere, the problem is the table format. If this is a non-negotiable then another alternative is:
If there are a few formats, but not too many, and they are well known (e.g. the three you've shown), then the query can be made to run faster by explicitly precalculating them all and searching for any of them.
select avg(price) as price from search_upload_detailed_results where
partNumber IN ('300-1231-932', '3001231932', '300 1231 932')
This will take the best advantage of the index you presumably have on partNumber.
You may find that MySQL can make good use of the indexes for carefully selected regular expressions.
select avg(price) as price from search_upload_detailed_results where
partNumber REGEXP '^300[- ]?1231[- ]?932';

Duplicating records in a mysql table intermittently causes statements to hang and not return

We are trying to duplicate existing records in a table: make 10 records out of one. The original table contains 75.000 records, and once the statements are done will contain about 750.000 (10 times as many). The statements sometimes finish after 10 minutes, but many times they never return. Hours later we will receive a timeout. This happens about 1 out of 3 times. We are using a test database where nobody is working on, so there is no concurrent access to the table. I don't see any way to optimise the SQL since to me the EXPLAIN PLAN looks fine.
The database is mysql 5.5 hosted on AWS RDS db.m3.x-large. The CPU load goes up to 50% during the statements.
Question: What could cause this intermittent behaviour? How do I resolve it?
This is the SQL to create a temporary table, make roughly 9 new records per existing record in ct_revenue_detail in the temporary table, and then copy the data from the temporary table to ct_revenue_detail
---------------------------------------------------------------------------------------------------------
-- CREATE TEMPORARY TABLE AND COPY ROLL-UP RECORDS INTO TABLE
---------------------------------------------------------------------------------------------------------
CREATE TEMPORARY TABLE ct_revenue_detail_tmp
SELECT r.month,
r.period,
a.participant_eid,
r.employee_name,
r.employee_cc,
r.assignments_cc,
r.lob_name,
r.amount,
r.gp_run_rate,
r.unique_id,
r.product_code,
r.smart_product_name,
r.product_name,
r.assignment_type,
r.commission_pcent,
r.registered_name,
r.segment,
'Y' as account_allocation,
r.role_code,
r.product_eligibility,
r.revenue_core,
r.revenue_ict,
r.primary_account_manager_id,
r.primary_account_manager_name
FROM ct_revenue_detail r
JOIN ct_account_allocation_revenue a
ON a.period = r.period AND a.unique_id = r.unique_id
WHERE a.period = 3 AND lower(a.rollup_revenue) = 'y';
This is the second query. It copies the records from the temporary table back to the ct_revenue_detail TABLE
INSERT INTO ct_revenue_detail(month,
period,
participant_eid,
employee_name,
employee_cc,
assignments_cc,
lob_name,
amount,
gp_run_rate,
unique_id,
product_code,
smart_product_name,
product_name,
assignment_type,
commission_pcent,
registered_name,
segment,
account_allocation,
role_code,
product_eligibility,
revenue_core,
revenue_ict,
primary_account_manager_id,
primary_account_manager_name)
SELECT month,
period,
participant_eid,
employee_name,
employee_cc,
assignments_cc,
lob_name,
amount,
gp_run_rate,
unique_id,
product_code,
smart_product_name,
product_name,
assignment_type,
commission_pcent,
registered_name,
segment,
account_allocation,
role_code,
product_eligibility,
revenue_core,
revenue_ict,
primary_account_manager_id,
primary_account_manager_name
FROM ct_revenue_detail_tmp;
This is the EXPLAIN PLAN of the SELECT:
+----+-------------+-------+------+------------------------+--------------+---------+------------------------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+------------------------+--------------+---------+------------------------------------+-------+-------------+
| 1 | SIMPLE | a | ref | ct_period,ct_unique_id | ct_period | 4 | const | 38828 | Using where |
| 1 | SIMPLE | r | ref | ct_period,ct_unique_id | ct_unique_id | 5 | optusbusiness_20160802.a.unique_id | 133 | Using where |
+----+-------------+-------+------+------------------------+--------------+---------+------------------------------------+-------+-------------+
This is the definition of ct_revenue_detail:
ct_revenue_detail | CREATE TABLE `ct_revenue_detail` (
`participant_eid` varchar(255) DEFAULT NULL,
`lob_name` varchar(255) DEFAULT NULL,
`amount` decimal(32,16) DEFAULT NULL,
`employee_name` varchar(255) DEFAULT NULL,
`period` int(11) NOT NULL DEFAULT '0',
`pk_id` int(11) NOT NULL AUTO_INCREMENT,
`gp_run_rate` decimal(32,16) DEFAULT NULL,
`month` int(11) DEFAULT NULL,
`assignments_cc` int(11) DEFAULT NULL,
`employee_cc` int(11) DEFAULT NULL,
`unique_id` int(11) DEFAULT NULL,
`product_code` varchar(50) DEFAULT NULL,
`smart_product_name` varchar(255) DEFAULT NULL,
`product_name` varchar(255) DEFAULT NULL,
`assignment_type` varchar(100) DEFAULT NULL,
`commission_pcent` decimal(32,16) DEFAULT NULL,
`registered_name` varchar(255) DEFAULT NULL,
`segment` varchar(100) DEFAULT NULL,
`account_allocation` varchar(25) DEFAULT NULL,
`role_code` varchar(25) DEFAULT NULL,
`product_eligibility` varchar(25) DEFAULT NULL,
`rollup` varchar(10) DEFAULT NULL,
`revised_amount` decimal(32,16) DEFAULT NULL,
`original_amount` decimal(32,16) DEFAULT NULL,
`comment` varchar(255) DEFAULT NULL,
`amount_revised_flag` varchar(255) DEFAULT NULL,
`exclude_segment` varchar(10) DEFAULT NULL,
`revenue_type` varchar(50) DEFAULT NULL,
`revenue_core` decimal(32,16) DEFAULT NULL,
`revenue_ict` decimal(32,16) DEFAULT NULL,
`primary_account_manager_id` varchar(100) DEFAULT NULL,
`primary_account_manager_name` varchar(100) DEFAULT NULL,
PRIMARY KEY (`pk_id`,`period`),
KEY `ct_participant_eid` (`participant_eid`),
KEY `ct_period` (`period`),
KEY `ct_employee_name` (`employee_name`),
KEY `ct_month` (`month`),
KEY `ct_segment` (`segment`),
KEY `ct_unique_id` (`unique_id`)
) ENGINE=InnoDB AUTO_INCREMENT=15338782 DEFAULT CHARSET=utf8
/*!50100 PARTITION BY HASH (period)
PARTITIONS 120 */ |
Edit 29.9: The intermittent behaviour was caused by the omission of a delete SQL statement. If the original table was not deleted before automatically duplicating records. The first time all is fine: we started with 75,000 records and ended up with 750,000 records.
Because the delete statement was missed the next time we already had 750,000 records, and the script would make 7.5M records out of it. That would still work, but the subsequent run trying to make 7.5M into 75M records would fail. 1 in 3 failures.
We would then try all the scripts manually, and of course then we would delete the table properly, and all would go well. The reason why we didn't see that beforehand was that our application does not output anything when running the SQL.
The real delay would be with your second query inserting from the temporary table back into the original tables. There are several issues here.
Sheer amount of data
Looking at your table, there are several columns of varchar(255) a conservative estimate would put an average length of your rows at 2kb. That's roughly about 1.5GB that's being copied from one table to another and being moved to different partitions! Partitioning makes reads more efficient but for inserting the engine has to figure out which partition the data should be moved to so it's actually writing to lots of different files instead of sequentially to one file. For spinning disks, this is slow.
Rebuilding the indexes
One of the biggest costs of inserts is rebuilding the indexes. In your case you have many of them.
KEY `ct_participant_eid` (`participant_eid`),
KEY `ct_period` (`period`),
KEY `ct_employee_name` (`employee_name`),
KEY `ct_month` (`month`),
KEY `ct_segment` (`segment`),
KEY `ct_unique_id` (`unique_id`)
And some of this indexes like employee_name are on varchar(255) columns. That means pretty hefty indexes.
Solution part 1 - Normalize
Your database isn't normalized. Here is a classic example:
primary_account_manager_id varchar(100) DEFAULT NULL,
primary_account_manager_name varchar(100) DEFAULT NULL,
You should really be having a table called account_manager and these two fields should be in that. primary_account_manager_id probably should be an integer field. It is only the id that should be in your ct_revenue_detail table.
Similarly you really shouldn't have employee_name, registered_name etc in this table. They should be in separate tables and they should be linked to ct_revenue_detail by foreign keys.
Solution part 2 - Rethink indexes.
Do you need so many? Mysql only uses one index per table per where clause anyway so some of these indexes are probably never used. Is this one really needed:
KEY `ct_unique_id` (`unique_id`)
You already have primary key why do you even need another unique column?
Indexes for the SELECT: For
SELECT ...
FROM ct_revenue_detail r
JOIN ct_account_allocation_revenue a
ON a.period = r.period AND a.unique_id = r.unique_id
WHERE a.period = 3 AND lower(a.rollup_revenue) = 'y';
a needs INDEX(period, rollup_revenue) in either order. However, you also need to declare rollup_revenue to have a ..._ci collation and avoiding the column in a function. That is change lower(a.rollup_revenue) = 'y' to a.rollup_revenue = 'y'.
r needs INDEX(period, unique_id) in either order. But, as e4c5 mentioned, if unique_id is really "unique" in this table, then take advantage of such.
Bulkiness is a problem when shoveling data around.
decimal(32,16) takes 16 bytes and gives you precision and range that are probably unnecessary. Consider FLOAT (4 bytes, ~7 significant digits, adequate range) or DOUBLE (8 bytes, ~16 significant digits, adequate range).
month int(11) takes 4 bytes. If that is just a value 1..12, then use TINYINT UNSIGNED (1 byte).
DEFAULT NULL -- I suspect most columns will never be NULL; if so, say NOT NULL for them.
amount_revised_flag varchar(255) -- if that is really a "flag", such as "yes"/"no", then use an ENUM and save lots of space.
It is uncommon to have both an id and a name in the same table (see primary_account_manager*); that is usually relegated to a "normalization table".
"Normalize" (already mentioned by #e4c5).
HASH partitioning
Hash partitioning is virtually useless. Unless you can justify it (preferably with a benchmark), I recommend removing partitioning. More discussion.
Adding or removing partitioning usually involves changing the indexes. Please show us the main queries so we can help you build suitable indexes (especially composite indexes) for the queries.

MySQL UPDATE Statement using LIKE with 2 Tables Takes Decades

can you please advise why such a query would take so long (literally 20-30 minutes)?
I seem to have proper indexes set up, don't I?
UPDATE `temp_val_import_435` t1,
`attr_upc` t2 SET t1.`attr_id` = t2.`id` WHERE t1.`value` LIKE t2.`upc`
CREATE TABLE `attr_upc` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`upc` varchar(255) NOT NULL,
`last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `upc` (`upc`),
KEY `last_update` (`last_update`)
) ENGINE=InnoDB AUTO_INCREMENT=102739 DEFAULT CHARSET=utf8
CREATE TABLE `temp_val_import_435` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`attr_id` int(11) DEFAULT NULL,
`translation_id` int(11) DEFAULT NULL,
`source_value` varchar(255) NOT NULL,
`value` varchar(255) DEFAULT NULL,
`count` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `core_value_id` (`core_value_id`),
KEY `translation_id` (`translation_id`),
KEY `source_value` (`source_value`),
KEY `value` (`value`),
KEY `count` (`count`)
) ENGINE=InnoDB AUTO_INCREMENT=32768 DEFAULT CHARSET=utf8
Ed Cottrell's solution worked for me. Using = instead of LIKE sped up a smaller test query on 1000 rows by a lot.
I measured 2 ways: 1 in phpMyAdmin, the other looking at the time for DOM load (which of course involves other processes).
DOM load went from 44 seconds to 1 second, a 98% increase.
But the difference in query execution time was much more dramatic, going from 43.4 seconds to 0.0052 seconds, a decrease of 99.988%. Pretty good. I will report back on results from huge datasets.
Use = instead of LIKE. = should be much faster than LIKE -- LIKE is only for matching patterns, as in '%something%', which matches anything with "something" anywhere in the text.
If you have this query:
SELECT * FROM myTable where myColumn LIKE 'blah'
MySQL can optimize this by pretending you typed myColumn = 'blah', because it sees that the pattern is fixed and has no wildcards. But what if you have this data in your upc column:
blah
foo
bar
%foo%
%bar
etc.
MySQL can't optimize your query in advance, because it's possible that the text it is trying to match is a pattern, like %foo%. So, it has to perform a full text search for LIKE matches on every single value of temp_val_import_435.value against every single value of attr_upc.upc. With a simple = and the indexes you have defined, this is unnecessary, and the query should be dramatically faster.
In essence you are joining on a LIKE which is going to be problematic (would need EXPLAIN to see is MySQL if utilizing indexes at all). Try this:
UPDATE `temp_val_import_435` t1
INNER JOIN `attr_upc` t2
ON t1.`value` LIKE t2.`upc`
SET t1.`attr_id` = t2.`id` WHERE t1.`value` LIKE t2.`upc`

GROUP BY Query -why so slow

I am trying to generate a group query on a large table (more than 8 million rows). However I can reduce the need to group all the data by date. I have a view that captures that dates I require and this limits the query bu it's not much better.
Finally I need to join to another table to pick up a field.
I am showing the query, the create on the main table and the query explain below.
Main Query:
SELECT pgi_raw_data.wsp_channel,
'IOM' AS wsp,
pgi_raw_data.dated,
pgi_accounts.`master`,
pgi_raw_data.event_id,
pgi_raw_data.breed,
Sum(pgi_raw_data.handle),
Sum(pgi_raw_data.payout),
Sum(pgi_raw_data.rebate),
Sum(pgi_raw_data.profit)
FROM pgi_raw_data
INNER JOIN summary_max
ON pgi_raw_data.wsp_channel = summary_max.wsp_channel
AND pgi_raw_data.dated > summary_max.race_date
INNER JOIN pgi_accounts
ON pgi_raw_data.account = pgi_accounts.account
GROUP BY pgi_raw_data.event_id
ORDER BY NULL
The create table:
CREATE TABLE `pgi_raw_data` (
`event_id` char(25) NOT NULL DEFAULT '',
`wsp_channel` varchar(5) NOT NULL,
`dated` date NOT NULL,
`time` time DEFAULT NULL,
`program` varchar(5) NOT NULL,
`track` varchar(25) NOT NULL,
`raceno` tinyint(2) NOT NULL,
`detail` varchar(30) DEFAULT NULL,
`ticket` varchar(20) NOT NULL DEFAULT '',
`breed` varchar(12) NOT NULL,
`pool` varchar(10) NOT NULL,
`gross` decimal(11,2) NOT NULL,
`refunds` decimal(11,2) NOT NULL,
`handle` decimal(11,2) NOT NULL,
`payout` decimal(11,4) NOT NULL,
`rebate` decimal(11,4) NOT NULL,
`profit` decimal(11,4) NOT NULL,
`account` mediumint(10) NOT NULL,
PRIMARY KEY (`event_id`,`ticket`),
KEY `idx_account` (`account`),
KEY `idx_wspchannel` (`wsp_channel`,`dated`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=latin1
This is my view for summary_max:
CREATE ALGORITHM=UNDEFINED DEFINER=`root`#`localhost` SQL SECURITY DEFINER VIEW
`summary_max` AS select `pgi_summary_tbl`.`wsp_channel` AS
`wsp_channel`,max(`pgi_summary_tbl`.`race_date`) AS `race_date`
from `pgi_summary_tbl` group by `pgi_summary_tbl`.`wsp
And also the evaluated query:
1 PRIMARY <derived2> ALL 6 Using temporary
1 PRIMARY pgi_raw_data ref idx_account,idx_wspchannel idx_wspchannel
7 summary_max.wsp_channel 470690 Using where
1 PRIMARY pgi_accounts ref PRIMARY PRIMARY 3 gf3data_momutech.pgi_raw_data.account 29 Using index
2 DERIVED pgi_summary_tbl ALL 42282 Using temporary; Using filesort
Any help on indexing would help.
At a minimum you need indexes on these fields:
pgi_raw_data.wsp_channel,
pgi_raw_data.dated,
pgi_raw_data.account
pgi_raw_data.event_id,
summary_max.wsp_channel,
summary_max.race_date,
pgi_accounts.account
The general (not always) rule is anything you are sorting, grouping, filtering or joining on should have an index.
Also: pgi_summary_tbl.wsp
Also, why the order by null?
The first thing is to be sure that you have indexes on pgi_summary_table(wsp_channel, race_date) and pgi_accounts(account). For this query, you don't need indexes on these columns in the raw data.
MySQL has a tendency to use indexes even when they are not the most efficient path. I would start by looking at the performance of the "full" query, without the joins:
SELECT pgi_raw_data.wsp_channel,
'IOM' AS wsp,
pgi_raw_data.dated,
-- pgi_accounts.`master`,
pgi_raw_data.event_id,
pgi_raw_data.breed,
Sum(pgi_raw_data.handle),
Sum(pgi_raw_data.payout),
Sum(pgi_raw_data.rebate),
Sum(pgi_raw_data.profit)
FROM pgi_raw_data
GROUP BY pgi_raw_data.event_id
If this has better performance, you may have a situation where the indexes are working against you. The specific problem is called "thrashing". It occurs when a table is too bit to fit into memory. Often, the fastest way to deal with such a table is to just read the whole thing. Accessing the table through an index can result in an extra I/O operation for most of the rows.
If this works, then do the joins after the aggregate. Also, consider getting more memory, so the whole table will fit into memory.
Second, if you have to deal with this type of data, then partitioning the table by date may prove to be a very useful option. This will allow you to significantly reduce the overhead of reading the large table. You do have to be sure that the summary table can be read the same way.