Is really necessary to have two fulltext indexes for text columns? - mysql

I inherited the codebase for a custom CMS built with MySQL and PHP which uses fulltext indexes to search in content (text) fields. When analyzing the database structure I found that all relevant tables were created in the following fashion (simplified example):
CREATE TABLE `stories` (
`story_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`headline` varchar(255) NOT NULL DEFAULT '',
`subhead` varchar(255) DEFAULT NULL,
`content` text NOT NULL,
PRIMARY KEY (`story_id`),
FULLTEXT KEY `fulltext_search` (`headline`,`subhead`,`content`),
FULLTEXT KEY `headline` (`headline`),
FULLTEXT KEY `subhead` (`subhead`),
FULLTEXT KEY `content` (`content`)
) ENGINE=MyISAM;
As you can see, the fulltext index is created in the usual way but then each column is added individually as well, which I believe creates two different indexes.
I've contacted the prior developer and he says that this is the "proper" way to create fulltext indexes, but according to every single example I've found in the Internet, there's no such requirement and this would be enough:
CREATE TABLE `stories` (
`story_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`headline` varchar(255) NOT NULL DEFAULT '',
`subhead` varchar(255) DEFAULT NULL,
`content` text NOT NULL,
PRIMARY KEY (`story_id`),
FULLTEXT KEY `fulltext_search` (`headline`,`subhead`,`content`)
) ENGINE=MyISAM;
The table has over 80,000 rows and is becoming increasingly hard to manage (the full database is near to 10GB) so I'd like to get rid of any unnecessary data.
Many thanks in advance.

The way to figure it out for yourself is to use EXPLAIN with the queries (matches) to see what indexes are actually used. If you have a query that doesn't use an index and is slow, make an index (or tell it manually to USE an index_hint), then try the EXPLAIN again to see if the index gets used.
I would expect that if your users are allowed to specify just one column to search on, and that column isn't first or the only one in the list of indexed columns, the query/match would use a non-indexed sequential search. In other words, with your index on (headline,subhead,content) I would expect the index to be used for any search with all three columns, or with just the headline, or with headline and subhead, but not for just subhead, and not for just content. I haven't done it in a while, so something might be different nowadays; but EXPLAIN should reveal what is going on.
If you examine all the possible queries with EXPLAIN and find that an index isn't used by any of them, you don't need it.

Related

why mysql still use index to get data when use the 2nd col of multiple column index in mysql?

Why mysql still use index to get data when use the 2nd col of multiple column index in mysql?
We know mysql use leftmost match rule, but here I didn't use the 1st col and I use the 2nd col, the two select operation results bellow show that mysql sometimes use index and sometimes didn't use it. Why? In addtion, my mysql version is 5.6.17.
1.create table:
CREATE TABLE `student` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`cid` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `name_cid_INX` (`name`,`cid`)
) ENGINE=InnoDB AUTO_INCREMENT=101 DEFAULT CHARSET=utf8
2.run select:
EXPLAIN SELECT * FROM student WHERE cid=1;
3. result:
Result with index
It shows that mysql use index to get data.
The following is another table.
1.create table:
CREATE TABLE `test_table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(45) DEFAULT NULL,
`birthday` datetime DEFAULT NULL,
`address` varchar(45) DEFAULT NULL,
`phone` varchar(45) DEFAULT NULL,
`note` varchar(45) DEFAULT NULL,
`age` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `NAME` (`name`),
KEY `AGE` (`age`),
KEY `LeftMostPreFix` (`name`,`address`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
2.run select:
explain SELECT * FROM test.test_table where address = '东京'
3.result:
Result without index
On the contrary here it shows that mysql didn't use index to get data.
Comparing above two results, I feel puzzled why the 1st result use index which is against leftmost match rule.
From the mysql manual
it is possible that key will name an index that is not present in the possible_keys value. This can happen if none of the possible_keys indexes are suitable for looking up rows, but all the columns selected by the query are columns of some other index. That is, the named index covers the selected columns, so although it is not used to determine which rows to retrieve, an index scan is more efficient than a data row scan.
So while there is a key used here, it's not actually used in the normal sense. In some situations it is still more efficient to use that as a table scan (in your first example), in others it might not be (in your second)
Most of the times these things are decided by the optimizer based on several things (usage of the table, etc).
Best thing to remember is that here you can NOT "use the index", and that's why there is no index in possible keys. You can only use the index if the first column is in there.
Neither index in either Case starts with what is in the WHERE, so there will be a full scan of table or of index.
Case 1: The index is "covering", so it is a tossup as to which (table scan vs index scan) is better. The Optimizer happened to pick the secondary index. EXPLAIN FORMAT=JSON SELECT ... may have enough details to explain 'why' in this case.
Case 2: Because of * (in SELECT *), the secondary index is at a disadvantage -- it is not "covering", so the processing will bounce back and forth between the index and the data. So it is clearly better to simply scan the table.
Instead of trying to understand EXPLAIN (in these cases), turn the question around: "What is the optimal index for this query against this table?" Then follow the guidelines here.

Should I be using multiple single-column indexes or a single multi-column index?

This is a pretty basic question, but I'm confused by what I'm reading in various places. I have a simple table that doesn't contain a huge amount of data (less than 500 rows for any given db is typical) A typical query against this table looks like :
select system_fields.name from system_fields where system_fields.form_id=? and system_fields.field_id=?
My question is, should I have a separate index for form_id and one for field_id, or should I be creating an index on a combination of those two fields? I've never really done anything with multi-column indexes before.
CREATE TABLE IF NOT EXISTS `system_fields` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`field_id` int(11) NOT NULL,
`form_id` int(11) NOT NULL,
`name` varchar(50) NOT NULL,
`reference_field_id` varchar(1000) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `field_id` (`field_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=293 ;
If you are always going to query by these two fields, then add a multi-column index.
I'll also point out that if you're going to have < 500 rows in the table, your index may not even get used. Any performance difference with or without an index on a 500-row table will be negligible.
Here's a bit more (good) reading:
https://www.percona.com/blog/2014/01/03/multiple-column-index-vs-multiple-indexes-with-mysql-56/

Best way to speed up a query in a innodb table with 100.000.000 rows in Mysql 5.6

I have a Mysql 5.6 table with 70 million rows in it, but it will grow to 100+ million rows or more in a few weeks.
I have a dedicated machine with a humble 500GB disk and 4GB RAM and the innodb_buffer_pool_size is set to 2GB.
The database uses 99% to selects and 1% to inserts (once a month).
The most important column is descripcion_detallada_producto varchar(300) and it is where the selects are aimed at in 90% of the times.
My table is:
CREATE TABLE `t1` (
`N_orden` bigint(20) NOT NULL DEFAULT '0',
`Fecha` varchar(15) COLLATE latin1_spanish_ci DEFAULT NULL,
`Ncm` int(11) NOT NULL,
`Origen` int(11) NOT NULL,
`Adquisicion` int(11) NOT NULL,
`Medida_Estadistica` int(11) NOT NULL,
`Unidad_Comercializacion` varchar(30) COLLATE latin1_spanish_ci DEFAULT NULL,
`Descripcion_Detallada_Producto` varchar(300) COLLATE latin1_spanish_ci DEFAULT NULL,
`Cantidad_Estadistica` double DEFAULT NULL,
`Peso_Liquido_Kg` double DEFAULT NULL,
`Valor_Fob` double DEFAULT NULL,
`Valor_Frete` double DEFAULT NULL,
`Valor_Seguro` double DEFAULT NULL,
`Valor_Unidad` double DEFAULT NULL,
`Cantidad` double DEFAULT NULL,
`Valor_Total` double DEFAULT NULL,
PRIMARY KEY (`N_orden`),
KEY `Ncm` (`Ncm`),
KEY `Origen` (`Origen`),
KEY `Adquisicion` (`Adquisicion`),
KEY `Medida_Estadistica` (`Medida_Estadistica`),
KEY `Descripcion_Detallada_Producto` (`Descripcion_Detallada_Producto`),
CONSTRAINT `t1_ibfk_1` FOREIGN KEY (`Ncm`) REFERENCES `ncm` (`Ncm`),
CONSTRAINT `t1_ibfk_2` FOREIGN KEY (`Origen`) REFERENCES `paises` (`Codigo_Pais`),
CONSTRAINT `t1_ibfk_3` FOREIGN KEY (`Adquisicion`) REFERENCES `paises` (`Codigo_Pais`),
CONSTRAINT `t1_ibfk_4` FOREIGN KEY (`Medida_Estadistica`) REFERENCES `medida_estadistica` (`Codigo_Medida_Estadistica`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_spanish_ci;
My question: Today a SELECT query using LIKE '%whatever%' takes normally 5 to 7 minutes, sometimes more. From where I understand the varchar index just are used when 'whatever%' is used, but I NEED to have the possibility to search for strings using left and right wildcards without needing to wait ~7 minutes each search. How can I do it?
The right way to fix the problem is to look at all the queries being run against the table, and their relative frequency. You've only given us part of one. You didn't even say which field it relates to. Since you do say "The most important column is descripcion_detallada_producto varchar(300) and it is where the selects are aimed at in 90% of the times" I'll assume that you only need to optimize
WHERE descripcion_detallada_producto LIKE '%wathever%'
As Vatev has already said, you probably should be using fulltext searches - which are sematically (and syntactically) different from LIKE predicates. Further you should be splitting the descripcion_detallada_producto attribute into it's own relation to reduce the buffer flushing effects of reading huge rows into memory from disk.
If you are searching for entire words that may be anywhere in a text column, you should consider using fulltext indexes, which are obviously used differently than wildcard searches. If you're unsure how to search your fulltext indexes, you can always get help with that.
Doing a search like the following will not use any of your indexes. Instead, it will scan through all rows of your table data, and you're subjected to disk reads (and any correlated disk fragmentation, which isn't usually a problem because we don't usually scan through tables):
SELECT * FROM t1
WHERE Descripcion_Detallada_Producto LIKE `%whatever%'
The following query would just scan through your index on Descripcion_Detallada_Producto which would act as a "covering" index (notice that the columns in the select make the difference):
SELECT N_orden FROM t1
WHERE Descripcion_Detallada_Producto LIKE `%whatever%'
The advantage in scanning an index instead of the actual table data is that the amount of data that is read as it scans is minimized, and ideally with a large innodb_buffer_pool_size, that index would be in memory, which would avoid disk seeks.
Once you get the N_orden values, then you could retrieve the individual records from the table data.
Additional Info
Consider reducing the size of the columns (bigint to unsigned int for N_orden) and reduce size of Descripcion_Detallada_Producto. Even though VARCHAR only uses up actual bytes (plus length) in the table data, each index entry actually uses the max, so reducing even a VARCHAR column size in an index will improve index scan speed.
In addition, if you have categories, restrict searches to selected categories and create a multi-column index on category+description. The following will only have to scan through a portion of a multi-column index on both category and description by restricting the search to a particular category:
SELECT N_orden FROM t1
WHERE Category = 1
AND Descripcion_Detallada_Producto LIKE `%whatever%'
Finally, consider removing wildcard prefixes. Make the user at least type the beginning of the model number.

MYSQL Long super-keys

I am currently working on a project, which involves altering data stored in a MYSQL database. Since the table that I am working on does not have a key, I add a key with the following command:
ALTER TABLE deCoupledData ADD COLUMN MY_KEY INT NOT NULL AUTO_INCREMENT KEY
Due to the fact that I want to group my records according to selected fields, I try to create an index for the table deCoupledData that consists of MY_KEY, along with the selected fields. For example, If I want to work with the fields STATED_F and NOT_STATED_F, I type:
ALTER TABLE deCoupledData ADD INDEX (MY_KEY, STATED_F, NOT_STATED_F)
The real issue is that the fields that I usually work with are more than 16, so MYSQL does not allow super-keys longer than 16 fields.
In conclusion, Is there another way to do this? Can I make (somehow) MYSQL to order the records according to the desired super-key (something like clustering)? I really need to make my script faster and the main overhead is that each group may contain records which are not stored on the same page of the disk, and I assume that my pc starts random I/Os in order to retrieve records.
Thank you for your time.
Nick Katsipoulakis
CREATE TABLE deCoupledData (
AA double NOT NULL DEFAULT '0',
STATED_F double DEFAULT NULL,
NOT_STATED_F double DEFAULT NULL,
MIN_VALUES varchar(128) NOT NULL DEFAULT '-1,-1',
MY_KEY int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (MY_KEY),
KEY AA (AA) )
ENGINE=InnoDB AUTO_INCREMENT=74358 DEFAULT CHARSET=latin1
Okay, first of all, when you add an index over multiple columns and you don't really use the first column, the index is useless.
Example: You have a query like
SELECT *
FROM deCoupledData
WHERE
stated_f = 5
AND not_stated_f = 10
and an index over (MY_KEY, STATED_F, NOT_STATED_F).
The index can only be used, if you have another AND my_key = 1 or something in the WHERE clause.
Imagine you want to look up every person in a telephone book with first name 'John'. Then the knowledge that the book is sorted by last name is useless, you still have to look up every single name.
Also, the primary key does not have to be a surrogate / artificial one. It's nearly always better to have a primary key which is made up of columns which identify each row uniquely anyway.
Also it's not always good to have many indexes. Not only do indexes slow down INSERTs and UPDATEs, sometimes they just cause an extra lookup, since first a look at the index is taken and a second look to find the actual data.
That's just a few tips. Maybe Jordan's hint is not a bad idea, "You should maybe post a new question that has your actual SQL query, table layout, and performance questions".
UPDATE:
Yes, that is possible. According to manual
If you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index.
which means that the data is practically sorted on disk, yes.
Be aware that it's also possible to define a primary key over multiple columns!
Like
CREATE TABLE deCoupledData (
AA double NOT NULL DEFAULT '0',
STATED_F double DEFAULT NULL,
NOT_STATED_F double DEFAULT NULL,
MIN_VALUES varchar(128) NOT NULL DEFAULT '-1,-1',
MY_KEY int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (NOT_STATED_F, STATED_F, AA),
KEY AA (AA) )
ENGINE=InnoDB AUTO_INCREMENT=74358 DEFAULT CHARSET=latin1
as long as the combination of the columns is unique.

How to optimize MySQL table containing 1.6+ million records for LIKE '%abc%' querying

I have a table with this structure and it currently contains about 1.6 million records.
CREATE TABLE `chatindex` (
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`roomname` varchar(90) COLLATE utf8_bin NOT NULL,
`username` varchar(60) COLLATE utf8_bin NOT NULL,
`filecount` int(10) unsigned NOT NULL,
`connection` int(2) unsigned NOT NULL,
`primaryip` int(10) unsigned NOT NULL,
`primaryport` int(2) unsigned NOT NULL,
`rank` int(1) NOT NULL,
`hashcode` varchar(12) COLLATE utf8_bin NOT NULL,
PRIMARY KEY (`timestamp`,`roomname`,`username`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Both the roomname and username columns can contain the same exact data, but the uniqueness and the important bit of each item comes from combining the timestamp with those two items.
The query that is starting to take a while (10-20 seconds) is this:
SELECT timestamp,roomname,username,primaryip,primaryport
FROM `chatindex`
WHERE username LIKE '%partialusername%'
What exactly can I do to optimize this? I can't do partialusername% because for some queries I will only have a small bit of the center of the actual username, and not the first few characters from the beginning of the actual value.
Edit:
Also, would sphinx be better for this particular purpose?
Use Fulltext indexes , these are actually designed for this purpose. Now InnoDb support fulltext indexes in MySQL 5.6.4.
Create Index on table column username (full-text indexing).
As an idea, you can create some views on this table that will contain filtered data on the basis of alphabets or other criteria and based on that your code will decide which view to use to fetch the search results.
You should use MyISAM table to do Fulltext search as it supports FULLTEXT indexes, MySQL v5.6+ is still under development phase you should not use it as a production servers and it may take ~1 year to go GA.
Now, You should convert this table as MyISAM and add FULLTEXT index which refers column in where clause:
These links can be useful:
http://dev.mysql.com/doc/refman/5.0/en/create-index.html
http://dev.mysql.com/doc/refman/5.1/en/fulltext-fine-tuning.html
On MSSQL this is a perfect case to use fulltext indexes together with CONTAIN clause. The LIKE clause fails to obtain a good performance on such big table and with so many variants of text to search for.
Take a look onto this link, there are many issues related to dinamic search conditions.
If you do an explain on the current query, you will see that you are doing a full table scan of the table which is why it is so slow. An index on username will materially speed up the search as the index can be cached by MySQL and the table row entries will only be accessed for matching users.
A fulltext index will not materially help searches like %fred% to match oldfredboy etc. so I am at loss as to why others are recommending using this. What a fulltext index does is to create a wordlist based index so that list you search for something like "explain the current query" the fulltext engine does a intersect of row IDs containing "explain" with those containing "current" and those containing "query" to get a list of ID which contain all three. Adding a fulltext index materially increases the insert , update on delete costs for the table, so it does add a performance penalty. Furthermore, you need to use the fulltext-specific "MATCH" syntax to make full use of a fulltext index.
If you do a question search on "[mysql] fulltext like" to see further discussion on this.
A normal index will do everything that you need. Searches like '%fred%' require a full scan of the index what ever you do so you need to keep the index as lean as possible. Also if a high % of hits match 'fred%', then it might also be worth first trying a like 'fred%' search first as this will do an index range scan.
One other point, why are you using the timestamp, roomname, username as the primary key? This doesn't make sense to me. If you don't use the primary key as an access path then an auto_increment id is easier. I would have thought roomname, timestamp, username would make some sense as you surely tend to access rooms within a time window.
Only add indexes that you will use.
Table index(full text indexes) is must for such high volumes of data.
Further if possible go for partitioning of table. so these will definitely improve the performance.