Very slow query when using ORDER BY and LIMIT? - mysql

The following query takes 10 seconds to finish when having order by. Without order by it finish in 0.0005 seconds. I am already having an index on field "sku", "vid" AND "timestamp". I have more 200,000 record in this table. Please help, what is wrong with the query when using order by.
SELECT i.pn,i.sku,i.title, fl.f_inserted,fl.f_special, fl.f_notinserted
FROM inventory i
LEFT JOIN inventory_flags fl ON fl.sku = i.sku AND fl.vid = i.vid
WHERE i.qty >=2 ORDER BY i.timestamp LIMIT 0,100;
-- --------------------------------------------------------
--
-- Table structure for table `inventory`
--
CREATE TABLE IF NOT EXISTS `inventory` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`pn` varchar(60) DEFAULT NULL,
`sku` varchar(60) DEFAULT NULL,
`title` varchar(60) DEFAULT NULL,
`qty` int(11) DEFAULT NULL,
`vid` int(11) DEFAULT NULL,
`timestamp` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `vid` (`vid`),
KEY `sku` (`sku`),
KEY `timestamp` (`timestamp`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
-- --------------------------------------------------------
--
-- Table structure for table `inventory_flags`
--
CREATE TABLE IF NOT EXISTS `inventory_flags` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`f_inserted` tinyint(1) DEFAULT NULL,
`f_notinserted` tinyint(1) DEFAULT NULL,
`f_special` tinyint(1) DEFAULT NULL,
`timestamp` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`sku` varchar(60) DEFAULT NULL,
`vid` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `vid` (`vid`),
KEY `sku` (`sku`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
EXPLANE RESULT:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE fl system vid,sku NULL NULL NULL 0 const row not found
1 SIMPLE i index NULL timestamp 5 NULL 10 Using where

Instead of adding seprate indexes on columns you need to put multicolumn index on tables as you are using more than one columns from same table in joining condition.
after including columns from WHERE clause also include columns used in ORDER BY clause in composite index.
try adding flowing indexes and test them using EXPLAIN:
ALTER TABLE ADD INDEX ix_if inventory_flags(sku, vid);
ALTER TABLE ADD INDEX ix_i inventory(sku, qty, timestamp);
also try to avoid DISTINCT clause in your query, it is equivalent to GROUP BY clause, if you still need it then consider adding covering index.

If sku is unique to each inventory item then define it as UNIQUE - it'll speed things up. (Or the combination of sku and vid - define a composite index in that case.)
Why are you doing SELECT DISTINCT? The vast majority of the time using DISTINCT is a sign that your query or your table structure is wrong.
Since it's DISTINCT, and sku is not UNIQUE it can't use the index on timestamp to speed things up, so it has to sort a table with 200,000 records - it can't even use an index on qty to speed that part up.
PS. Omesh has some good advice as well.

you can use force index(index_key). try it, and you will see in explain query that mysql now will use the key index when 'order by'

Related

MySQL Partitioning a Table That Contains a Primary Key

I have a table that I want to partition:
CREATE TABLE `tbl_orders` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR(50) NOT NULL DEFAULT '0' COLLATE 'utf8mb4_general_ci',
`system_id` INT(11) NOT NULL DEFAULT '0',
`created_at` DATETIME NULL DEFAULT NULL,
`updated_at` DATETIME NULL DEFAULT NULL,
PRIMARY KEY (`id`) USING BTREE,
INDEX `system_id` (`system_id`) USING BTREE
)
COLLATE='utf8mb4_general_ci'
ENGINE=InnoDB
AUTO_INCREMENT=8
;
ALTER table tbl_orders
PARTITION BY HASH(system_id)
PARTITIONS 4;
Example of what im trying to achieve:
I have a table which I want to partition by system_id in order to speed up queries.
When I run the partition I get the following error:
/* SQL Error (1503): A PRIMARY KEY must include all columns in the table's partitioning function */
What would I change to run this partition successfully whilst still achieving my aim which is to split the table on system_id?
Is partitioning this way achievable with a primary key on the table?
PARTITIONing requires you to add the "partition key" (system_id) to every Unique index, including the PRIMARY KEY.
You will, I predict, find that PARTITION BY HASH is useless for performance. It may even slow down the query.
Please show a query that you hope to speed up; I will advise in more detail.

MySQL: Why is this SQL-query not using index?

I have a very simple SELECT that resorts to filesort and does not use index.
Consider the following query:
SELECT * FROM forum_topic
WHERE topic_status = 0
ORDER BY modified_date LIMIT 0, 30
on the following table (stripped of a few columns to make it more brief here)
CREATE TABLE `forum_topic` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`slug` varchar(255) NOT NULL,
`forum_id` int(10) NOT NULL DEFAULT '1',
`title` varchar(100) NOT NULL,
`topic_status` tinyint(1) NOT NULL DEFAULT '0',
`post_count` bigint(20) NOT NULL DEFAULT '0',
`modified_date` datetime NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `slug` (`slug`),
FULLTEXT KEY `title` (`title`),
KEY `modified` (`modified_date`, `topic_status`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
EXPLAIN gives the following output
id select_type table? partitions? type? possible_keys? key? key_len? ref? rows? Extra?
1 SIMPLE forum_topic NULL ALL NULL NULL NULL NULL 2075 Using where; Using filesort
Notice how the explain says there are NULL for possible_keys and how it's using filesort after having scanned ALL rows.
Please advice. Thanks.
This query needs topic_status to appear in the most significant position of an index, because it's searching on a constant.
You have
KEY `modified` (`modified_date`, `topic_status`)
and you may want
KEY `mod2` (`topic_status`, `modified_date` )
instead. This may satisfy both the filter and the ORDER BY ... LIMIT part of the query.
Pro tip: Avoid SELECT * and enumerate the columns you actually need instead.
Pro tip: Filesort doesn't necessarily mean what you think it means. It's used anytime MySQL needs to construct an intermediate result set for such things as sorting.

Optimizing aggregation on MySQL Table with 850 million rows

I have a query that I'm using to summarize via aggregations.
The table is called 'connections' and has about 843 million rows.
CREATE TABLE `connections` (
`app_id` varchar(16) DEFAULT NULL,
`user_id` bigint(20) DEFAULT NULL,
`time_started_dt` datetime DEFAULT NULL,
`device` varchar(255) DEFAULT NULL,
`os` varchar(255) DEFAULT NULL,
`firmware` varchar(255) DEFAULT NULL,
KEY `app_id` (`bid`),
KEY `time_started_dt` (`time_started_dt`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
When I try to run a query, such as the one below, it takes over 10 hours and I end up killing it. Does anyone see any mistakes that I'm making, of have any suggestions as to how I could optimize the query?
SELECT
app_id,
MAX(time_started_dt),
MIN(time_started_dt),
COUNT(*)
FROM
connections
GROUP BY
app_id
I suggest you create a composite index on (app_id, time_started_dt):
ALTER TABLE connections ADD INDEX(app_id, time_started_dt)
To get that query to perform, you really need a suitable covering index, with app_id as the leading column, e.g.
CREATE INDEX `connections_IX1` ON `connections` (`app_id`,` time_start_dt`);
NOTE: creating the index may take hours, and the operation will prevent insert/update/delete to the table while it is running.
An EXPLAIN will show the proposed execution plan for your query. With the covering index in place, you'll see "Using index" in the plan. (A "covering index" is an index that can be used by MySQL to satisfy a query without having to access the underlying table. That is, the query can be satisfied entirely from the index.)
With the large number of rows in this table, you may also want to consider partitioning.
I have tried your query on randomly generated data (around 1 million rows). Adding PRIMATY KEY will improve performance of your query by 10%.
As already suggested by other people composite index should be added to the table. Index time_started_dt is useless.
CREATE TABLE `connections` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`app_id` varchar(16) DEFAULT NULL,
`user_id` bigint(20) DEFAULT NULL,
`time_started_dt` datetime DEFAULT NULL,
`device` varchar(255) DEFAULT NULL,
`os` varchar(255) DEFAULT NULL,
`firmware` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `composite_idx` (`app_id`,`time_started_dt`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

optimize query (2 simple left joins)

SELECT fcat.id,fcat.title,fcat.description,
count(DISTINCT ftopic.id) as number_topics,
count(DISTINCT fpost.id) as number_posts FROM fcat
LEFT JOIN ftopic ON fcat.id=ftopic.cat_id
LEFT JOIN fpost ON ftopic.id=fpost.topic_id
GROUP BY fcat.id
ORDER BY fcat.ord
LIMIT 100;
index on ftopic_cat_id, fpost.topic_id, fcat.ord
EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE fcat ALL PRIMARY NULL NULL NULL 11 Using temporary; Using filesort
1 SIMPLE ftopic ref PRIMARY,cat_id_2 cat_id_2 4 bloki.fcat.id 72
1 SIMPLE fpost ref topic_id_2 topic_id_2 4 bloki.ftopic.id 245
fcat - 11 rows,
ftopic - 1106 rows,
fpost - 363000 rows
Query takes 4,2 sec
TABLES:
CREATE TABLE IF NOT EXISTS `fcat` (
`id` int(11) NOT NULL auto_increment,
`title` varchar(250) collate utf8_unicode_ci NOT NULL,
`description` varchar(250) collate utf8_unicode_ci NOT NULL,
`created` datetime NOT NULL,
`visible` tinyint(4) NOT NULL default '1',
`ord` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `ord` (`ord`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=12 ;
CREATE TABLE IF NOT EXISTS `ftopic` (
`id` int(11) NOT NULL auto_increment,
`cat_id` int(11) NOT NULL,
`title` varchar(100) collate utf8_unicode_ci NOT NULL,
`created` datetime NOT NULL,
`updated` timestamp NOT NULL default CURRENT_TIMESTAMP,
`lastname` varchar(200) collate utf8_unicode_ci NOT NULL,
`visible` tinyint(4) NOT NULL default '1',
`closed` tinyint(4) NOT NULL default '0',
`views` int(11) NOT NULL default '1',
PRIMARY KEY (`id`),
KEY `cat_id_2` (`cat_id`,`updated`,`visible`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1116 ;
CREATE TABLE IF NOT EXISTS `fpost` (
`id` int(11) NOT NULL auto_increment,
`topic_id` int(11) NOT NULL,
`pet_id` int(11) NOT NULL,
`content` text collate utf8_unicode_ci NOT NULL,
`imageName` varchar(300) collate utf8_unicode_ci NOT NULL,
`created` datetime NOT NULL,
`reply_id` int(11) NOT NULL,
`visible` tinyint(4) NOT NULL default '1',
`md5` varchar(100) collate utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `md5` (`md5`),
KEY `topic_id_2` (`topic_id`,`created`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=390971 ;
Thanks,
hamlet
you need to create a key with both fcat.id, fcat.ord
Bold rewrite
This code is not functionally identical, but...
Because you want to know about distinct ftopic.id and fpost.id I'm going to be bold and suggest two INNER JOIN's instead of LEFT JOIN's.
Then because the two id's are autoincrementing they will no longer repeat, so you can drop the distinct.
SELECT
fcat.id
, fcat.title
, fcat.description
, count(ftopic.id) as number_topics
, count(fpost.id) as number_posts
FROM fcat
INNER JOIN ftopic ON fcat.id = ftopic.cat_id
INNER JOIN fpost ON ftopic.id = fpost.topic_id
GROUP BY fcat.id
ORDER BY fcat.ord
LIMIT 100;
It depends on your data if this is what you are looking for, but I'm guessing it will be faster.
All your indexes seem to be in order though.
MySQL does not use indexes for small sample sizes!
Note that the explain list that MySQL only has 11 rows to consider for fcat. This is not enough for MySQL to really start worrying about indexes, so it doesn't.
Because going to the index for small row-counts slows things down.
MySQL is trying to speed things up so it chooses not to use the index, this confuses a lot of people because we are trained so hard on the index. Small sample sizes don't give good explains!
Increase the size of the test data so MySQL has more rows to consider and you should start seeing the index being used.
Common misconceptions about force index
Force index does not force MySQL to use an index as such.
It hints at MySQL to use a different index from the one it might naturally use and it pushes MySQL into using an index by setting a very high cost on a table scan.
(In your case MySQL is not using a table scan, so force index has no effect)
MySQL (same most other DBMS's on the planet) has a very strong urge to use indexes, so if it doesn't (use any) that's because using no index at all is faster.
How does MySQL know which index to use
One of the parameters the query optimizer uses is the stored cardinality of the indexes.
Over time these values change... But studying the table takes time, so MySQL doesn't do that unless you tell it to.
Another parameter that affects index selection is the predicted disk-seek-times that MySQL expects to encounter when performing the query.
Tips to improve index usage
ANALYZE TABLE will instruct MySQL to re-evaluate the indexes and update its key distribution (cardinality). (consider running it daily/weekly in a cron job)
SHOW INDEX FROM table will display the key distribution.
MyISAM tables and indexes fragment over time. Use OPTIMIZE TABLE to unfragment the tables and recreate the indexes.
FORCE/USE/IGNORE INDEX limits the options MySQL's query optimizer has to perform your query. Only consider it on complex queries.
Time the effect of your meddling with indexes on a regular basis. A forced index that speeds up your query today might slow it down tomorrow because the underlying data has changed.

MySQL ORDER BY optimization in many to many tables

Tables:
CREATE TABLE IF NOT EXISTS `posts` (
`post_n` int(10) NOT NULL auto_increment,
`id` int(10) default NULL,
`date` datetime NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (`post_n`,`visibility`),
KEY `id` (`id`),
KEY `date` (`date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
CREATE TABLE IF NOT EXISTS `subscriptions` (
`subscription_n` int(10) NOT NULL auto_increment,
`id` int(10) NOT NULL,
`subscribe_id` int(10) NOT NULL,
PRIMARY KEY (`subscription_n`),
KEY `id` (`id`),
KEY `subscribe_id` (`subscribe_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Query:
SELECT posts.* FROM posts, subscriptions
WHERE posts.id=subscriptions.subscribe_id AND subscriptions.id=1
ORDER BY date DESC LIMIT 0, 15
It`s so slow because used indexes "id", "subscribe_id" but not index "date" thus ordering is very slow.
Is there any options to change the query, indexes, architecture?
Possible Improvements:
First, you'll gain a couple microseconds per query if you name your fields instead of using SELECT posts.* which causes a schema lookup. Change your query to:
SELECT posts.post_n, posts.id, posts.date
FROM posts, subscriptions
WHERE posts.id=subscriptions.subscribe_id
AND subscriptions.id=1
ORDER BY date DESC
LIMIT 0, 15
Next, this requires MySQL 5.1 or higher, but you might want to consider partitioning your tables. You might consider KEY partitioning for both tables.
This should get you started.
http://dev.mysql.com/doc/refman/5.1/en/partitioning-types.html
E.g.
SET SQL_MODE = 'ANSI';
-- to allow default date
CREATE TABLE IF NOT EXISTS `posts` (
`post_n` int(10) NOT NULL auto_increment,
`id` int(10) default NULL,
`date` datetime NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (`post_n`,`id`),
KEY `id` (`id`),
KEY `date` (`date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin
PARTITION BY KEY(id) PARTITIONS 32;
--
CREATE TABLE IF NOT EXISTS `subscriptions` (
`subscription_n` int(10) NOT NULL auto_increment,
`id` int(10) NOT NULL,
`subscribe_id` int(10) NOT NULL,
PRIMARY KEY (`subscription_n`,`subscribe_id`),
KEY `id` (`id`),
KEY `subscribe_id` (`subscribe_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin
PARTITION BY KEY(subscribe_id) PARTITIONS 32;
I had to adjust your primary key a bit. So, beware, this may NOT work for you. Please test it and make sure. I hope, this does though. Make sure to run sysbench against the old and new structures/queries to compare results before going to production.
:-)
If you're able to modify the table, you could add a multi-field index containing both ID and date. (or modify one of the existing keys to contain them both).
If you can't make changes to the database, and if you know that your result set is going to be small, you can force it to use a specific named key, with USE KEY(name). The ordering would then be done after the fact, just on the reslts returned.
Hope that helps.