Optimize MYSQL Select query in large table - mysql

Given the table:
CREATE TABLE `sample` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`vendorid` VARCHAR(45) NOT NULL,
`year` INT(10) NOT NULL,
`title` TEXT NOT NULL,
`description` TEXT NOT NULL
PRIMARY KEY (`id`) USING BTREE
)
Table size: over 7 million. All fields are not unique, except id.
Simple query:
SELECT * FROM sample WHERE title='milk'
Takes over 45s-60s to complete.
Tried to put unique index on title and description but got 1170 error.
How could I optimize it? Would be very grateful for suggestions.

TEXT columns need prefix indexes -- it's not possible to index their entire contents; they can be too large. And, if the column values aren't unique, don't use UNIQUE indexes; they won't work.
Try this:
ALTER TABLE simple ADD INDEX title_prefix (title(64));
Pro tip For columns you need to use in WHERE statements, do your best to use VARCHAR(n) where n is less than 768. Avoid TEXT and other blob types unless you absolutely need them; they can make for inefficient operation of your database server.

Related

why mysql still use index to get data when use the 2nd col of multiple column index in mysql?

Why mysql still use index to get data when use the 2nd col of multiple column index in mysql?
We know mysql use leftmost match rule, but here I didn't use the 1st col and I use the 2nd col, the two select operation results bellow show that mysql sometimes use index and sometimes didn't use it. Why? In addtion, my mysql version is 5.6.17.
1.create table:
CREATE TABLE `student` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`cid` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `name_cid_INX` (`name`,`cid`)
) ENGINE=InnoDB AUTO_INCREMENT=101 DEFAULT CHARSET=utf8
2.run select:
EXPLAIN SELECT * FROM student WHERE cid=1;
3. result:
Result with index
It shows that mysql use index to get data.
The following is another table.
1.create table:
CREATE TABLE `test_table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(45) DEFAULT NULL,
`birthday` datetime DEFAULT NULL,
`address` varchar(45) DEFAULT NULL,
`phone` varchar(45) DEFAULT NULL,
`note` varchar(45) DEFAULT NULL,
`age` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `NAME` (`name`),
KEY `AGE` (`age`),
KEY `LeftMostPreFix` (`name`,`address`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
2.run select:
explain SELECT * FROM test.test_table where address = '东京'
3.result:
Result without index
On the contrary here it shows that mysql didn't use index to get data.
Comparing above two results, I feel puzzled why the 1st result use index which is against leftmost match rule.
From the mysql manual
it is possible that key will name an index that is not present in the possible_keys value. This can happen if none of the possible_keys indexes are suitable for looking up rows, but all the columns selected by the query are columns of some other index. That is, the named index covers the selected columns, so although it is not used to determine which rows to retrieve, an index scan is more efficient than a data row scan.
So while there is a key used here, it's not actually used in the normal sense. In some situations it is still more efficient to use that as a table scan (in your first example), in others it might not be (in your second)
Most of the times these things are decided by the optimizer based on several things (usage of the table, etc).
Best thing to remember is that here you can NOT "use the index", and that's why there is no index in possible keys. You can only use the index if the first column is in there.
Neither index in either Case starts with what is in the WHERE, so there will be a full scan of table or of index.
Case 1: The index is "covering", so it is a tossup as to which (table scan vs index scan) is better. The Optimizer happened to pick the secondary index. EXPLAIN FORMAT=JSON SELECT ... may have enough details to explain 'why' in this case.
Case 2: Because of * (in SELECT *), the secondary index is at a disadvantage -- it is not "covering", so the processing will bounce back and forth between the index and the data. So it is clearly better to simply scan the table.
Instead of trying to understand EXPLAIN (in these cases), turn the question around: "What is the optimal index for this query against this table?" Then follow the guidelines here.

Should I be using multiple single-column indexes or a single multi-column index?

This is a pretty basic question, but I'm confused by what I'm reading in various places. I have a simple table that doesn't contain a huge amount of data (less than 500 rows for any given db is typical) A typical query against this table looks like :
select system_fields.name from system_fields where system_fields.form_id=? and system_fields.field_id=?
My question is, should I have a separate index for form_id and one for field_id, or should I be creating an index on a combination of those two fields? I've never really done anything with multi-column indexes before.
CREATE TABLE IF NOT EXISTS `system_fields` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`field_id` int(11) NOT NULL,
`form_id` int(11) NOT NULL,
`name` varchar(50) NOT NULL,
`reference_field_id` varchar(1000) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `field_id` (`field_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=293 ;
If you are always going to query by these two fields, then add a multi-column index.
I'll also point out that if you're going to have < 500 rows in the table, your index may not even get used. Any performance difference with or without an index on a 500-row table will be negligible.
Here's a bit more (good) reading:
https://www.percona.com/blog/2014/01/03/multiple-column-index-vs-multiple-indexes-with-mysql-56/

Should we use the "LIMIT clause" in following example?

There is a structure:
CREATE TABLE IF NOT EXISTS `categories` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`parent_id` int(11) unsigned NOT NULL DEFAULT '0',
`title` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Query_1:
SELECT * FROM `categories` WHERE `id` = 1234
Query_2:
SELECT * FROM `categories` WHERE `id` = 1234 LIMIT 1
I need to get just one row. Since we apply WHERE id=1234 (finding by PRIMARY KEY) obviously that row with id=1234 is only one in whole table.
After MySQL has found the row, whether engine to continue the search when using Query_1?
Thanks in advance.
Look at this SQLFiddle: http://sqlfiddle.com/#!2/a8713/4 and especially View Execution Plan.
You see, that MySQL recognizes the predicate on a PRIMARY column and therefore it does not matter if you add LIMIT 1 or not.
PS: A little more explanation: Look at the column rows of the Execution Plan. The number there is the amount of columns, the query engine thinks, it has to examine. Since the columns content is unique (as it's a primary key), this is 1. Compare it to this: http://sqlfiddle.com/#!2/9868b/2 same schema but without primary key. Here rows says 8. (The execution plan is explained in the German MySQL reference, http://dev.mysql.com/doc/refman/5.1/en/explain.html the English one is for some reason not so detailed.)

Will this MySQL index increase performance?

I have the following MySQL table:
CREATE TABLE `my_data` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`tstamp` int(10) unsigned NOT NULL DEFAULT '0',
`name` varchar(255) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
I want to optimise for the following SELECT query only:
SELECT `id`, `name` FROM `my_data` ORDER BY `name` ASC
Will adding the following index increase performance, regardless of the size of the table?
CREATE INDEX `idx_name_id` ON `my_data` (`name`,`id`);
An EXPLAIN query suggests it would, but I have no quick way of testing with a large data set.
Yes it would. Even though you are still doing a full table scan, having the index will make the sort operation (due to the order by) unnecessary.
But it will also add overhead to inserts and delete statements!
Index are usefull when you use the column in the where clause, or when the column is a part of the condition for a link between two tables. See MySQL Doc

Is really necessary to have two fulltext indexes for text columns?

I inherited the codebase for a custom CMS built with MySQL and PHP which uses fulltext indexes to search in content (text) fields. When analyzing the database structure I found that all relevant tables were created in the following fashion (simplified example):
CREATE TABLE `stories` (
`story_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`headline` varchar(255) NOT NULL DEFAULT '',
`subhead` varchar(255) DEFAULT NULL,
`content` text NOT NULL,
PRIMARY KEY (`story_id`),
FULLTEXT KEY `fulltext_search` (`headline`,`subhead`,`content`),
FULLTEXT KEY `headline` (`headline`),
FULLTEXT KEY `subhead` (`subhead`),
FULLTEXT KEY `content` (`content`)
) ENGINE=MyISAM;
As you can see, the fulltext index is created in the usual way but then each column is added individually as well, which I believe creates two different indexes.
I've contacted the prior developer and he says that this is the "proper" way to create fulltext indexes, but according to every single example I've found in the Internet, there's no such requirement and this would be enough:
CREATE TABLE `stories` (
`story_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`headline` varchar(255) NOT NULL DEFAULT '',
`subhead` varchar(255) DEFAULT NULL,
`content` text NOT NULL,
PRIMARY KEY (`story_id`),
FULLTEXT KEY `fulltext_search` (`headline`,`subhead`,`content`)
) ENGINE=MyISAM;
The table has over 80,000 rows and is becoming increasingly hard to manage (the full database is near to 10GB) so I'd like to get rid of any unnecessary data.
Many thanks in advance.
The way to figure it out for yourself is to use EXPLAIN with the queries (matches) to see what indexes are actually used. If you have a query that doesn't use an index and is slow, make an index (or tell it manually to USE an index_hint), then try the EXPLAIN again to see if the index gets used.
I would expect that if your users are allowed to specify just one column to search on, and that column isn't first or the only one in the list of indexed columns, the query/match would use a non-indexed sequential search. In other words, with your index on (headline,subhead,content) I would expect the index to be used for any search with all three columns, or with just the headline, or with headline and subhead, but not for just subhead, and not for just content. I haven't done it in a while, so something might be different nowadays; but EXPLAIN should reveal what is going on.
If you examine all the possible queries with EXPLAIN and find that an index isn't used by any of them, you don't need it.