Optimizing the mysql query - Avoid creation of temporary table? - mysql

This is the query that I am using on tables : products, reviews, replies, review_images.
Query :
SELECT products.id, reviews.*,
GROUP_CONCAT(DISTINCT CONCAT_WS('~',replies.reply, replies.time)) AS Replies,
GROUP_CONCAT(DISTINCT CONCAT_WS('~',review_images.image_title, review_images.image_location)) AS ReviewImages
FROM products
LEFT JOIN reviews on products.id = reviews.product_id
LEFT JOIN replies on reviews.id = replies.review_id
LEFT JOIN review_images on reviews.id = review_images.review_id
WHERE products.id = 1
GROUP BY products.id, reviews.id;
Schema :
Products :
id | name | product_details....
Reviews :
id | product_id | username | review | time | ...
Replies :
id | review_id | username | reply | time | ...
Review Images :
id | review_id | image_title | image_location | ...
Indexes:
Products :
PRIMARY KEY - id
Reviews :
PRIMARY KEY - id
FOREIGN KEY - product_id (id IN products table)
FOREIGN KEY - username (username IN users table)
Replies :
PRIMARY KEY - id
FOREIGN KEY - review_id (id IN reviews table)
FOREIGN KEY - username (username IN users table)
Review Images :
PRIMARY KEY - id
FOREIGN KEY - review_id (id IN reviews table)
Explain Query :
id | select_type | table | type | possible_keys | rows | extra
1 | SIMPLE | products | index | null | 1 | Using index; Using temporary; Using filesort
1 | SIMPLE | reviews | ALL | product_id | 4 | Using where; Using join buffer (Block Nested Loop)
1 | SIMPLE | replies | ref | review_id | 1 | Null
1 | SIMPLE | review_images | ALL | review_id | 5 | Using where; Using join buffer (Block Nested Loop)
I don't know what is wrong here, that it needs to use filesort and create a temporary table?
Here are few Profiling results :
Opening Tables 140 µs
Init 139 µs
System Lock 34 µs
Optimizing 21 µs
Statistics 106 µs
Preparing 146 µs
Creating Tmp Table 13.6 ms
Sorting Result 27 µs
Executing 11 µs
Sending Data 11.6 ms
Creating Sort Index 1.4 ms
End 89 µs
Removing Tmp Table 8.9 ms
End 34 µs
Query End 25 µs
Closing Tables 66 µs
Freeing Items 41 µs
Removing Tmp Table 1.4 ms
Freeing Items 46 µs
Removing Tmp Table 1.2 ms
Freeing Items 203 µs
Cleaning Up 55 µs
As from the Explain and Profiling results, it is clear that temporary table is created to produce the results. How can I optimize this query to get similar results and better performance and avoid the creation of temporary table?
Help would be appreciated. Thanks in advance.
EDIT
Create Tables
CREATE TABLE `products` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(100) NOT NULL,
`description` varchar(100) NOT NULL,
`items` int(11) NOT NULL,
`price` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB
CREATE TABLE `reviews` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(30) NOT NULL,
`product_id` int(11) NOT NULL,
`review` text NOT NULL,
`time` datetime NOT NULL,
`ratings` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `product_id` (`product_id`),
KEY `username` (`username`)
) ENGINE=InnoDB
CREATE TABLE `replies` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`review_id` int(11) NOT NULL,
`username` varchar(30) NOT NULL,
`reply` text NOT NULL,
`time` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `review_id` (`review_id`)
) ENGINE=InnoDB
CREATE TABLE `review_images` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`review_id` int(11) NOT NULL,
`image_title` text NOT NULL,
`image_location` text NOT NULL,
PRIMARY KEY (`id`),
KEY `review_id` (`review_id`)
) ENGINE=InnoDB
EDIT:
I simplified the query above and now it does not create temporary tables. The only reason as mentioned by #Bill Karwin was that I was using GROUP BY on second table in the joins.
Simplified query :
SELECT reviews. * ,
GROUP_CONCAT( DISTINCT CONCAT_WS( '~', replies.reply, replies.time ) ) AS Replies,
GROUP_CONCAT( DISTINCT CONCAT_WS( '~', review_images.image_title, review_images.image_location ) ) AS ReviewImages
FROM reviews
LEFT JOIN replies ON reviews.id = replies.review_id
LEFT JOIN review_images ON reviews.id = review_images.review_id
WHERE reviews.product_id = 1
GROUP BY reviews.id
Now the PROBLEM that I'm facing is :
Because I'm using GROUP_CONCAT, there is a limit to the data it can hold which is in the variable GROUP_CONCAT_MAX_LEN, so as I'm concatenating the replies given by the users, it could go very very long and can possibly exceed the memory defined. I know I can change the value of GROUP_CONCAT_MAX_LEN for current session, but still there is a limitation to it that at some point in time, the query may fail or unable to fetch complete results.
How can I modify my query so as not to use GROUP_CONCAT and still get results expected.
POSSIBLE SOLUTION :
Simply using LEFT JOINS, which creates duplicate rows for every new result in the last column and which makes it hard to traverse in php? Any suggestions?
I see this question is not getting enough response from SO members. But I've been looking for the solution and searching about concepts since last to last week. Still no luck. Hope some of you PROs can help me out. Thanks in advance.

You can't avoid creating a temporary table when your GROUP BY clause references columns from two different tables.
The only way to avoid the temporary table in this query is to store a denormalized version of the data in one table, and index the two columns you're grouping by.
Another way you can simplify and get results in a format that's easier to work with in PHP is to do multiple queries, without GROUP BY.
First get the reviews. Example is in PHP & PDO, but the principle applies to any language.
$review_stmt = $pdo->query("
SELECT reviews.*,
FROM reviews
WHERE reviews.product_id = 1");
Arrange them in an associative array keyed by the review_id.
$reviews = array();
while ($row => $review_stmt->fetch(PDO::FETCH_ASSOC)) {
$reviews[$row['d']] = $row;
}
Then get the replies and append them to an array using the key 'replies'. Use INNER JOIN instead of LEFT JOIN, because it's okay if there are no replies.
$reply_stmt = $pdo->query("
SELECT replies.*
FROM reviews
INNER JOIN replies ON reviews.id = replies.review_id
WHERE reviews.product_id = 1");
while ($row = $reply_stmt->fetch(PDO::FETCH_ASSOC)) {
$reviews[$row['review_id']]['replies'][] = $row;
}
And do the same for review_images.
$reply_stmt = $pdo->query("
SELECT review_images.*
FROM reviews
INNER JOIN review_images ON reviews.id = review_images.review_id
WHERE reviews.product_id = 1");
while ($row = $reply_stmt->fetch(PDO::FETCH_ASSOC)) {
$reviews[$row['review_id']]['review_images'][] = $row;
}
The end result is an array of reviews, which contains elements which are nested arrays for related replies and images respectively.
The efficiency of running simpler queries can make up for the extra work of running three queries. Plus you don't have to write code to explode() the group-concatted strings.

Related

Optimise comparing data in two big MySQL tables

How could I optimise query, which will find all records, which:
have activation_request.date_confirmed not null
and
do not have related string value in another table: activation_request.email =
user.username shouldn't return any record
I tried:
SELECT email
FROM activation_request l
LEFT JOIN user r ON r.username = l.email
WHERE l.date_confirmed is not null
AND r.username IS NULL
and
SELECT email
FROM activation_request
WHERE date_confirmed is not null
AND NOT EXISTS (SELECT 1
FROM user
WHERE user.username = activation_request.email
)
but both tables have xxx.xxx.xxx records hence after all night running those queries unfortunatelly I haven't got any results.
Create statements:
CREATE TABLE `activation_request` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`version` bigint(20) NOT NULL,
`date_confirmed` datetime DEFAULT NULL,
`email` varchar(255) NOT NULL,
(...)
PRIMARY KEY (`id`),
KEY `emailIdx` (`email`),
KEY `reminderSentIdx` (`date_reminder_sent`),
KEY `idx_resent_needed` (`date_reminder_sent`,`date_confirmed`),
) ENGINE=InnoDB AUTO_INCREMENT=103011867 DEFAULT CHARSET=utf8;
CREATE TABLE `user` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`version` bigint(20) NOT NULL,
`username` varchar(255) NOT NULL,
(...)
PRIMARY KEY (`id`),
UNIQUE KEY `Q52plW9W7TJWZcLj00K3FmuhwMSw4F7vmxJGyjxz5iiINVR9fXyacEoq4rHppb` (`username`),
) ENGINE=InnoDB AUTO_INCREMENT=431400048 DEFAULT CHARSET=latin1;
Explain for LEFT JOIN:
[[id:1, select_type:SIMPLE, table:l, type:ALL, possible_keys:null,
key:null, key_len:null, ref:null, rows:49148965, Extra:Using where],
[id:1, select_type:SIMPLE, table:r, type:index, possible_keys:null,
key:Q52plW9W7TJWZcLj00K3FmuhwMSw4F7vmxJGyjxz5iiINVR9fXyacEoq4rHppb,
key_len:257, ref:null, rows:266045508, Extra:Using where; Not exists;
Using index; Using join buffer (Block Nested Loop)]] [[id:1,
select_type:SIMPLE, table:l, type:ALL, possible_keys:null, key:null,
key_len:null, ref:null, rows:49148965, Extra:Using where], [id:1,
select_type:SIMPLE, table:r, type:index, possible_keys:null,
key:Q52plW9W7TJWZcLj00K3FmuhwMSw4F7vmxJGyjxz5iiINVR9fXyacEoq4rHppb,
key_len:257, ref:null, rows:266045508, Extra:Using where; Not exists;
Using index; Using join buffer (Block Nested Loop)]]
After adding indexes on staging db (with slightly less data, but the same structure) query is now running ~24h and still no results):
$ show processlist;
| Id | User | Host | db | Command | Time | State | Info
| 64 | root | localhost | staging_db | Query | 110072 | Sending data | SELECT ar.email FROM activation_request ar WHERE ar.date_confirmed is not null AND NOT EXISTS (SELE |
Mysql version:
$ select version();
5.6.16-1~exp1
All other commands on the list are Sleep so there is no other query running and possibly disturbing/locking rows.
For this query:
SELECT ar.email
FROM activation_request ar
WHERE ar.date_confirmed is not null AND
NOT EXISTS (SELECT 1
FROM user u
WHERE u.username = ar.email
)
I would recommend indexes on activation_request(date_confirmed, email) and user(username).
Unless you have a really humongous amount of data, though, your problem may be that tables are locked.

MySql group by optimization - avoid tmp table and/or filesort

I have a slow query, without the group by is fast (0.1-0.3 seconds), but with the (required) group by the duration is around 10-15s.
The query joins two tables, events (near 50 million rows) and events_locations (5 million rows).
Query:
SELECT `e`.`id` AS `event_id`,`e`.`time_stamp` AS `time_stamp`,`el`.`latitude` AS `latitude`,`el`.`longitude` AS `longitude`,
`el`.`time_span` AS `extra`,`e`.`entity_id` AS `asset_name`, `el`.`other_id` AS `geozone_id`,
`el`.`group_alias` AS `group_alias`,`e`.`event_type_id` AS `event_type_id`,
`e`.`entity_type_id`AS `entity_type_id`, el.some_id
FROM events e
INNER JOIN events_locations el ON el.event_id = e.id
WHERE 1=1
AND el.other_id = '1'
AND time_stamp >= '2018-01-01'
AND time_stamp <= '2019-06-02'
GROUP BY `e`.`event_type_id` , `el`.`some_id` , `el`.`group_alias`;
Table events:
CREATE TABLE `events` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`event_type_id` int(11) NOT NULL,
`entity_type_id` int(11) NOT NULL,
`entity_id` varchar(64) NOT NULL,
`alias` varchar(64) NOT NULL,
`time_stamp` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `entity_id` (`entity_id`),
KEY `event_type_idx` (`event_type_id`),
KEY `idx_events_time_stamp` (`time_stamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Table events_locations
CREATE TABLE `events_locations` (
`event_id` bigint(20) NOT NULL,
`latitude` double NOT NULL,
`longitude` double NOT NULL,
`some_id` bigint(20) DEFAULT NULL,
`other_id` bigint(20) DEFAULT NULL,
`time_span` bigint(20) DEFAULT NULL,
`group_alias` varchar(64) NOT NULL,
KEY `some_id_idx` (`some_id`),
KEY `idx_events_group_alias` (`group_alias`),
KEY `idx_event_id` (`event_id`),
CONSTRAINT `fk_event_id` FOREIGN KEY (`event_id`) REFERENCES `events` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The explain:
+----+-------------+-------+--------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
| 1 | SIMPLE | ea | ALL | 'idx_event_id' | NULL | NULL | NULL | 5152834 | 'Using where; Using temporary; Using filesort' |
| 1 | SIMPLE | e | eq_ref | 'PRIMARY,idx_events_time_stamp' | PRIMARY | '8' | 'name.ea.event_id' | 1 | |
+----+-------------+----------------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
2 rows in set (0.08 sec)
From the doc:
Temporary tables can be created under conditions such as these:
If there is an ORDER BY clause and a different GROUP BY clause, or if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue, a temporary table is created.
DISTINCT combined with ORDER BY may require a temporary table.
If you use the SQL_SMALL_RESULT option, MySQL uses an in-memory temporary table, unless the query also contains elements (described later) that require on-disk storage.
I already tried:
Create an index by 'el.some_id , el.group_alias'
Decrease the varchar size to 20
Increase the size of sort_buffer_size and read_rnd_buffer_size;
Any suggestions for performance tuning would be much appreciated!
In your case events table has time_span as indexing property. So before joining both tables first select required records from events table for specific date range with required details. Then join the event_location by using table relation properties.
Check your MySql Explain keyword to check how does your approach your table records. It will tell you how much rows are scanned for before selecting required records.
Number of rows that are scanned also involve in query execution time. Use my below logic to reduce the number of rows that are scanned.
SELECT
`e`.`id` AS `event_id`,
`e`.`time_stamp` AS `time_stamp`,
`el`.`latitude` AS `latitude`,
`el`.`longitude` AS `longitude`,
`el`.`time_span` AS `extra`,
`e`.`entity_id` AS `asset_name`,
`el`.`other_id` AS `geozone_id`,
`el`.`group_alias` AS `group_alias`,
`e`.`event_type_id` AS `event_type_id`,
`e`.`entity_type_id` AS `entity_type_id`,
`el`.`some_id` as `some_id`
FROM
(select
`id` AS `event_id`,
`time_stamp` AS `time_stamp`,
`entity_id` AS `asset_name`,
`event_type_id` AS `event_type_id`,
`entity_type_id` AS `entity_type_id`
from
`events`
WHERE
time_stamp >= '2018-01-01'
AND time_stamp <= '2019-06-02'
) AS `e`
JOIN `events_locations` `el` ON `e`.`event_id` = `el`.`event_id`
WHERE
`el`.`other_id` = '1'
GROUP BY
`e`.`event_type_id` ,
`el`.`some_id` ,
`el`.`group_alias`;
The relationship between these tables is 1:1, so, I asked me why is a group by required and I found some duplicated rows, 200 in 50000 rows. So, somehow, my system is inserting duplicates and someone put that group by (years ago) instead of seek of the bug.
So, I will mark this as solved, more or less...

GROUP BY + ORDER BY make my query very slow

I am trying to figure out what I should do to my query and/ or to my tables structure to improve a query to get the best sellers which is run in over 1 sec.
Here is the query I'm talking about:
SELECT pr.id_prod, MAX(pr.stock) AS stock, MAX(pr.dt_add) AS dt_add, SUM(od.quantity) AS quantity
FROM orders AS o
INNER JOIN orders_details AS od ON od.id_order = o.id_order
INNER JOIN products_references AS pr ON pr.id_prod_ref = od.id_prod_ref
INNER JOIN products AS p ON p.id_prod = pr.id_prod
WHERE o.id_order_status > 11
AND pr.active = 1
GROUP BY p.id_prod
ORDER BY quantity
LIMIT 10
If I use GROUP BY p.id_prod instead of GROUP BY pr.id_prod and remove the ORDER BY, the query is run in 0.07sec.
is that EXPLAIN table OKAY?
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE o range PRIMARY,id_order_status id_order_status 1 75940 Using where; Using index; Using temporary; Using filesort
1 SIMPLE od ref id_order,id_prod_ref id_order 4 dbname.o.id_order 1
1 SIMPLE pr eq_ref PRIMARY,id_prod PRIMARY 4 dbname.od.id_prod_ref 1 Using where
1 SIMPLE p eq_ref PRIMARY,name_url,id_brand,name PRIMARY 4 dbname.pr.id_prod 1 Using index
And this is the EXPLAIN without the ORDER BY
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE p index PRIMARY,name_url,id_brand,name PRIMARY 4 1 Using index
1 SIMPLE pr ref PRIMARY,id_prod id_prod 4 dbname.p.id_prod 2 Using where
1 SIMPLE od ref id_order,id_prod_ref id_prod_ref 4 dbname.pr.id_prod_ref 67
1 SIMPLE o eq_ref PRIMARY,id_order_status PRIMARY 4 dbname.od.id_order 1 Using where
And here is the table structures
CREATE TABLE `orders` (
`id_order` int(10) unsigned NOT NULL AUTO_INCREMENT,
`id_dir` int(10) unsigned DEFAULT NULL,
`id_status` tinyint(3) unsigned NOT NULL DEFAULT '11',
PRIMARY KEY (`id_order`),
KEY `id_dir` (`id_dir`),
KEY `id_status` (`id_status`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `orders_details` (
`id_order_det` int(10) unsigned NOT NULL AUTO_INCREMENT,
`id_order` int(10) unsigned NOT NULL,
`id_prod_ref` int(10) unsigned NOT NULL,
`quantity` smallint(5) unsigned NOT NULL DEFAULT '1',
PRIMARY KEY (`id_order_det`),
UNIQUE KEY `id_order` (`id_order`,`id_prod_ref`) USING BTREE,
KEY `id_prod_ref` (`id_prod_ref`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `products` (
`id_prod` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(60) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id_prod`),
FULLTEXT KEY `name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `products_references` (
`id_prod_ref` int(10) unsigned NOT NULL AUTO_INCREMENT,
`id_prod` int(10) unsigned NOT NULL,
`stock` smallint(6) NOT NULL DEFAULT '0',
`dt_add` datetime DEFAULT NULL,
`active` tinyint(1) NOT NULL DEFAULT 0,
PRIMARY KEY (`id_prod_ref`),
KEY `id_prod` (`id_prod`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
I also tried to give you the tables relations (ON UPDATE, ON DELETE CASCADE, ...) but didn't manage to export it. But I don't think it's crucial for now!
Try using the alias name in order by and not the value from the table
and use the group by for the value in select (is the same for join because is inner join on equal value and the value form the pr is not retrived for select result )
SELECT p.id_prod, p.name, SUM(od.quantity) AS quantity
FROM orders AS o
INNER JOIN orders_details AS od ON od.id_order = o.id_order
INNER JOIN products_references AS pr ON pr.id_prod_ref = od.id_prod_ref
INNER JOIN products AS p ON p.id_prod = pr.id_prod
WHERE pr.active = 1
GROUP BY p.id_prod
ORDER BY quantity
LIMIT 10
do not forget to use appropriate indexes on join columns
(Rewritten after OP added more info.)
SELECT pr.id_prod,
MAX(pr.stock) AS max_stock,
MAX(pr.dt_add) AS max_dt_add
SUM(od.quantity) AS sum_quantity
FROM orders AS o
INNER JOIN orders_details AS od
ON od.id_order = o.id_order
INNER JOIN products_references AS pr
ON pr.id_prod_ref = od.id_prod_ref
WHERE o.id_order_status > 11
AND pr.active = 1
GROUP BY pr.id_prod
ORDER BY sum_quantity
LIMIT 10
Note that p was removed as being irrelevant.
Beware of SUM() when using JOIN with GROUP BY -- you might get an incorrect, inflated, value.
Improvement on one table:
CREATE TABLE `orders_details` (
`id_order` int(10) unsigned NOT NULL,
`id_prod_ref` int(10) unsigned NOT NULL,
`quantity` smallint(5) unsigned NOT NULL DEFAULT '1',
PRIMARY KEY (`id_order`,`id_prod_ref`),
INDEX (id_prod_ref, id_order)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Here's why: od sounds like a many:many mapping table. See here for tips on improving performance in it.
GROUP BY usually involves a sort. ORDER BY, when it is not identical to the GROUP BY definitely requires another sort.
Removing the ORDER BY allows the query to return any 10 rows without the sort. (This may explain the timing difference.)
Note the alias sum_quantity to avoid ambiguity between the column quantity and your alias quantity.
Explaining EXPLAIN
1 SIMPLE o range id_order_status 1 75940 Using where; Using index; Using temporary; Using filesort
1 SIMPLE od ref id_order 4 o.id_order 1
1 SIMPLE pr eq_ref PRIMARY 4 od.id_prod_ref 1 Using where
1 SIMPLE p eq_ref PRIMARY 4 pr.id_prod 1 Using index
The tables will be accessed in the order given (o,od,pr,p).
o won't use the data ("Using index") but will scan the id_order_status index which includes (id_status, id_order). Note: The PRIMARY KEY columns are implicitedly added to any secondary key.
It estimates 76K will need to be scanned (for > 11).
Somewhere in the processing, there will a temp table and a sort of it. This may or may not involve disk I/O.
The reach into od might find 1 row, might find 0 or more than 1 ("ref").
The reaching into pr and p are known to get at most 1 row.
pr does a small amount of filtering (active=1), but not until the third line of EXPLAIN. And no index is useful for this filtering. This could be improved, but only slightly, by a composite index (active, id_prod_ref). With only 5-10% being filtered out, this won't help much.
After all the JOINing and filtering, there will be two temp tables and sorts, one for GROUP BY, one for ORDER BY.
Only after that, will 10 rows be peeled off from the 70K (or so) rows collected up to this point.
Without the ORDER BY, the EXPLAIN shows that a different order seems to be better. And the tmp & sort went away.
1 SIMPLE p index PRIMARY 4 1 Using index
1 SIMPLE pr ref id_prod 4 p.id_prod 2 Using where
1 SIMPLE od ref id_prod_ref 4 pr.id_prod_ref 67
1 SIMPLE o eq_ref PRIMARY 4 dbne.od.id_order 1 Using where
There seem to be only 1 row in p, correct? So, in a way, it does not matter when this table is accessed. When you have multiple "products" all this analysis may change!
"key=PRIMARY", "Using index" is sort of a misnomer. It is really using the data, but being able to efficiently access it because the PRIMARY KEY is "clustered" with the data.
There is only one pr row?? Perhaps the optimizer realized that GROUP BY was not needed?
When it got to od, it estimated that "67" rows would be needed per p+pr combo.
You removed the ORDER BY, so there is no need to sort, and any 10 rows can be delivered.

How can I optimize MySQL query for update?

I have a table with 300 000 records. In this table have duplicae rows and I want to update column "flag"
TABLE
------------------------------------
|number | flag | ... more column ...|
------------------------------------
|ABCD | 0 | ...................|
|ABCD | 0 | ...................|
|ABCD | 0 | ...................|
|BCDE | 0 | ...................|
|BCDE | 0 | ...................|
I use this query for updating "flag" column:
UPDATE table i
INNER JOIN (SELECT number FROM table
GROUP BY number HAVING count(number) > 1 ) i2
ON i.number = i2.number
SET i.flag = '1'
This query working very very slowly (more 600 seconds) for this 300 000 records.
How Can I optimize this query?
STRUCTURE OF MY TABLE
CREATE TABLE IF NOT EXISTS `inv` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`pn` varchar(10) NOT NULL COMMENT 'Part Number',
`qty` int(5) NOT NULL,
`qty_old` int(5) NOT NULL,
`flag_qty` tinyint(1) NOT NULL,
`name` varchar(60) NOT NULL,
`vid` int(11) NOT NULL ,
`flag_d` tinyint(1) NOT NULL ,
`flag_u` tinyint(1) NOT NULL ,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `pn` (`pn`),
KEY `name` (`name`),
KEY `vid` (`vid`),
KEY `pn_2` (`pn`),
KEY `flag_qty` (`flag_qty`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=0 ;
If "name" is duplicate I want to update flag_qty
If you do not already have an index on number you should add one -
CREATE INDEX table_number ON table (number);
UPDATE Try this -
UPDATE inv t1
INNER JOIN inv t2
ON t1.name = t2.name
AND t1.id <> t2.id
SET t1.flag_qty = 1;
You can create your table with just the duplicates by selecting this data directly into another table instead of doing this flag update first.
INSERT INTO duplicate_invs
SELECT DISTINCT inv1.*
FROM inv AS inv1
INNER JOIN inv AS inv2
ON inv1.name = inv2.name
AND inv1.id < inv2.id
If you can explain the logic for which rows get deleted from inv table it may be that the whole process can be done in one step.
Get MySQL to EXPLAIN the query to you. Then you will see what indexing would improve things.
EXPLAIN will show you where it is slow and here're some ideas, how to improve perfomance:
Add indexing
Use InnoDB foreign keys
Split query into 2 and process them separately in lagnuage you use.
write the same idea in MySQL procedure (not sure, whether this would be fast).
I would use a temp table. 1.) select all relevant records into a temp table, set INDEX on id. 2.) update the table using something like this
UPDATE table i, tmp_i
SET i.flag = '1'
WHERE i.id = tmp_i.id
you can try (assuming VB.net, but can be implemented with any language).
Dim ids As String = Cmd.ExectueScalar("select group_concat(number) from (SELECT number FROM table GROUP BY number HAVING count(number) > 1)")
After you get the list of IDs (comma-delimited) than use
UPDATE i
SET i.flag = '1'
WHERE i.number in ( .... )
It can be slow also, but the first - SELECT, will not lock up your database and replication, etc. the UPDATE will be faster.

Help me optimize this MySql query

I have a MySql query that take a very long time to run (about 7 seconds). The problem seems to be with the OR in this part of the query: "(tblprivateitem.userid=?userid OR tblprivateitem.userid=1)". If I skip the "OR tblprivateitem.userid=1" part it takes only 0.01 seconds. As I need that part I need to find a way to optimize this query. Any ideas?
QUERY:
SELECT
tbladdeditem.addeditemid,
tblprivateitem.iitemid,
tblprivateitem.itemid
FROM tbladdeditem
INNER JOIN tblprivateitem
ON tblprivateitem.itemid=tbladdeditem.itemid
AND (tblprivateitem.userid=?userid OR tblprivateitem.userid=1)
WHERE tbladdeditem.userid=?userid
EXPLAIN:
id select_type table type possible_keys key key_len ref rows extra
1 SIMPLE tbladdeditem ref userid userid 4 const 293 Using where
1 SIMPLE tblprivateitem ref userid,itemid itemid 4 tbladdeditem.itemid 2 Using where
TABLES:
tbladdeditem contains 1 100 000 rows:
CREATE TABLE `tbladdeditem` (
`addeditemid` int(11) NOT NULL auto_increment,
`itemid` int(11) default NULL,
`userid` mediumint(9) default NULL,
PRIMARY KEY (`addeditemid`),
KEY `userid` (`userid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
tblprivateitem contains 2 700 000 rows:
CREATE TABLE `tblprivateitem` (
`privateitemid` int(11) NOT NULL auto_increment,
`userid` mediumint(9) default '1',
`itemid` int(10) NOT NULL,
`iitemid` mediumint(9) default NULL,
PRIMARY KEY (`privateitemid`),
KEY `userid` (`userid`),
KEY `itemid` (`itemid`) //Changed this index to only use itemid instead
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
UPDATE
I made my queries and schema match your original question exactly, multi-column key and all. The only possible difference is that I populated each table with two million entries. My query (your query) runs in 0.15 seconds.
delimiter $$
set #userid = 6
$$
SELECT
tbladdeditem.addeditemid, tblprivateitem.iitemid, tblprivateitem.itemid
FROM tbladdeditem
INNER JOIN tblprivateitem
ON tblprivateitem.itemid=tbladdeditem.itemid
AND (tblprivateitem.userid=#userid or tblprivateitem.userid = 1)
WHERE tbladdeditem.userid=#userid
I have the same explain that you do, and with my data, my query return over a thousand matches without any issue at all. Being completely at a loss, as you really shouldn't be having these issues -- is it possible you are running a very limiting version of MySQL? Are you running 64-bit? Plenty of memory?
I had made the assumption that your query wasn't performing well, and when mine was, assumed I had fixed you problem. So now I eat crow. I'll post some of the avenues I went down. But I'm telling you, your query the way you posted it originally works just fine. I can only imagine your MySQL thrashing to the hard drive or something. Sorry I couldn't be more help.
PREVIOUS RESPONSE (Which is also an update)
I broke down and recreated your problem in my own database. After trying independent indexes on userid and on itemid I was unable to get the query below a few seconds, so I set up very specific multi-column keys as directed by the query. Notice on tbladdeditem the multi-column query begins with itemid while on the tblprivateitem the columns are reversed:
Here is the schema I used:
CREATE TABLE `tbladdeditem` (
`addeditemid` int(11) NOT NULL AUTO_INCREMENT,
`itemid` int(11) NOT NULL,
`userid` mediumint(9) NOT NULL,
PRIMARY KEY (`addeditemid`),
KEY `userid` (`userid`),
KEY `i_and_u` (`itemid`,`userid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `tblprivateitem` (
`privateitemid` int(11) NOT NULL AUTO_INCREMENT,
`userid` mediumint(9) NOT NULL DEFAULT '1',
`itemid` int(10) NOT NULL,
`iitemid` mediumint(9) NOT NULL,
PRIMARY KEY (`privateitemid`),
KEY `userid` (`userid`),
KEY `itemid` (`itemid`),
KEY `u_and_i` (`userid`,`itemid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I filled each table with 2 million entries of random data. I made some assumptions:
userid varies from 1 to 2000
itemid varies between 1 and 10000
This gives each user about a thousand entries in each table.
Here are two versions of the query (I'm using workbench for my editor):
Version 1 - do all the filtering on the join.
Result: 0.016 seconds to return 1297 rows
delimiter $$
set #userid = 3
$$
SELECT
a.addeditemid,
p.iitemid,
p.itemid
FROM tblprivateitem as p
INNER JOIN tbladdeditem as a
ON (p.userid in (1, #userid))
AND p.itemid = a.itemid
AND a.userid = #userid
$$
Here's the explain:
EXPLAIN:
id select_type table type key ref rows extra
1 SIMPLE p range u_and_i 2150 Using where; Using index
1 SIMPLE a ref i_and_u 1 Using where; Using index
Version 2 - filter up front
Result: 0.015 seconds to return 1297 rows
delimiter $$
set #userid = 3
$$
SELECT
a.addeditemid,
p.iitemid,
p.itemid
from
(select userid, itemid, iitemid from tblprivateitem
where userid in (1, #userid)) as p
join tbladdeditem as a on p.userid = a.userid and a.itemid = p.itemid;
where a.userid = #userid
$$
Here's the explain:
id select_type table type key ref rows extra
1 PRIMARY <derived2> ALL null null 2152
1 PRIMARY a ref i_and_u p.itemid,const 1 Using where; Using index
2 DERIVED p1 range u_and_i 2150 Using where
Since you have the predicate condition tbladdeditem.userid=?userid in the where clause I don't think you need it in the join condition.. Try removing it from the join condition and (If you are using the Or to handle the case where the parameter is null, then use Coalesce instead of OR) if not leave it as an Or
-- If Or is to provide default for when (?userid is null...
SELECT a.addeditemid, p.iitemid, p.itemid
FROM tbladdeditem a
JOIN tblprivateitem p
ON p.itemid=a.itemid
WHERE a.userid=?userid
AND p.userid=Coalesce(?userid, 1)
-- if not then
SELECT a.addeditemid, p.iitemid, p.itemid
FROM tbladdeditem a
JOIN tblprivateitem p
ON p.itemid=a.itemid
WHERE a.userid=?userid
AND (p.userid=?userid Or p.userid = 1)
Second, if there is not an index on the userId column in these two tables, consider adding one.
Finally, if these all fail, try converting to two separate queries and unioning them together:
Select a.addeditemid, p.iitemid, p.itemid
From tbladdeditem a
Join tblprivateitem p
On p.itemid=a.itemid
And p.userId = a.Userid
Where p.userid=?userid
Union
Select a.addeditemid, p.iitemid, p.itemid
From tbladdeditem a
Join tblprivateitem p
On p.itemid=a.itemid
And p.userId = a.Userid
Where p.userid = 1
I would try this instead, on your original JOIN you have an OR associated with a parameter, move that to your WHERE clause.
SELECT
tbladdeditem.addeditemid,
tblprivateitem.iitemid,
tblprivateitem.itemid
FROM tbladdeditem
INNER JOIN tblprivateitem
ON tblprivateitem.itemid=tbladdeditem.itemid
WHERE tbladdeditem.userid=?userid
AND (tblprivateitem.userid=?userid OR tblprivateitem.userid=1)