innodb_ft_min_token_size = 1 performance implications - mysql

If I change innodb_ft_min_token_size =1 from default of 3, will this cause a lot more disk usage? Any performance issues with search?
I want to be able to use fulltext search in 1 character in words.
Also once I make this change how would I rebuild the index? Will this put a lot of load on server?

There are not that many 1- and 2- letter words, so the space change may not be that great.
Modifying innodb_ft_min_token_size, innodb_ft_max_token_size, or ngram_token_size [in my.cnf] requires restarting the server.
To rebuild FULLTEXT indexes for an InnoDB table, use ALTER TABLE with the DROP INDEX and ADD INDEX options to drop and re-create each index.
-- https://dev.mysql.com/doc/refman/8.0/en/fulltext-fine-tuning.html
The "Scope" of innodb_ft_min_token_size is "Global". That is, it applies to all InnoDB FT indexes.
-- https://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html#sysvar_innodb_ft_min_token_size
Recreating the index will read the entire table and rebuild the FT index, which will "lock" the table at some level for some period of time. The time to rebuild will be roughly proportional to the size of the table. And it will consume a bunch of extra disk space until it is finished. (The table and all the indexes will be copied over and at least the FT index will be rebuilt.)
If you have a thousand rows, no big deal. If you have a billion rows, you will need a long "downtime".
After changing the innodb_ft_max_token_size, I would be afraid to do a short wildcard test like
AGAINST('a*' IN BOOLEAN MODE)
If you have a test server, simply try it.
I noticed that the documentation recommends a value of 1 for Chinese, etc.

Related

Handling huge MyISAM table for optimisation

I have a huge (and growing) MyISAM table (700millions rows = 140Gb).
CREATE TABLE `keypairs` (
`ID` char(60) NOT NULL,
`pair` char(60) NOT NULL,
PRIMARY KEY (`ID`)
) ENGINE=MyISAM
The table option was changed to ROW_FORMAT=FIXED, cause both columns are always fixed length to max (60). And yes yes, ID is well a string sadly and not an INT.
SELECT queries are pretty ok in speed efficiency.
Databases and mysql engine are all 127.0.0.1/localhost. (nothing distant)
Sadly, INSERT is slow as hell. I dont even talk about trying to LOAD DATA millions new rows... takes days.
There won't have any concurrent read on it. All SELECTs are done one by one by only my local server.(it is not for client's use)
(for infos : files sizes .MYD=88Gb, .MYI=53Gb, .TMM=400Mb)
How could i speed up inserts into that table?
Would it help to PARTITION that huge table ? (how then?)
I heard MyISAM is using "structure cache" as .frm files. And that a line into config file is helping mysql keep in memory all the .frm (in case of partitionned), would it help also? Actualy, my .frm file is 9kb only for 700millions rows)
string shortenning/compress function... the ID string? (same idea as rainbow tables) even if it lowers the max allowed unique ID's, i will anyway never reach the max of 60chars. so maybe its an idea? but before creating a new unique ID i have to check if shortened string doesn't exists in db ofc
Same idea as shortening ID strings, what about using md5() on the ID? shorten string means faster or not in that case?
Sort the incoming data before doing the LOAD. This will improve the cacheability of the PRIMARY KEY(id).
PARTITIONing is unlikely to help, unless there is some useful pattern to ID.
PARTITIONing will not help for single-row insert nor for single-row fetch by ID.
If the strings are not a constant width of 60, you are wasting space and speed by saying CHAR instead of VARCHAR. Change that.
MyISAM's FIXED is useful only if there is a lot of 'churn' (deletes+inserts, and/or updates).
Smaller means more cacheable means less I/O means faster.
The .frm is an encoding of the CREATE TABLE; it is not relevant for this discussion.
A simple compress/zip/whatever will almost always compress text strings longer than 10 characters. And they can be uncompressed, losslessly. What do your strings look like? 60-character English text will shrink to 20-25 bytes.
MD5 is a "digest", not a "compression". You cannot recover the string from its MD5. Anyway, it would take 16 bytes after converting to BINARY(16).
The PRIMARY KEY is a BTree. If ID is somewhat "random", then the 'next' ID (unless the input is sorted) is likely not to be cached. No, the BTree is not rebalanced all the time.
Turning the PRIMARY KEY into a secondary key (after adding an AUTO_INCREMENT) will not speed things up -- it still has to update the BTree with ID in it!
How much RAM do you have? For your situation, and for this LOAD, set MyISAM's key_buffer_size to about 70% of available RAM, but not bigger than the .MYI file. I recommend a big key_buffer because that is where the random accesses are occurring; the .MYD is only being appended to (assuming you have never deleted any rows).
We do need to see your SELECTs to make sure these changes are not destroying performance somewhere else.
Make sure you are using CHARACTER SET latin1 or ascii; utf8 would waste a lot more space with CHAR.
Switching to InnoDB will double, maybe triple, the disk space for the table (data+index). Therefore, it will probably show down. But a mitigating factor is that the PK is "clustered" with the data, so you are not updating two things for each row inserted. Note that key_buffer_size should be lowered to 10M and innodb_buffer_pool_size should be set to 70% of available RAM.
(My bullet items apply to InnoDB except where MyISAM is specified.)
In using InnoDB, it would be good to try to insert 1000 rows per transaction. Less than that leads to more transaction overhead; more than that leads to overrunning the undo log, causing a different form of slowdown.
Hex ID
Since ID is always 60 hex digits, declare it to be BINARY(30) and pack them via UNHEX(...) and fetch via HEX(ID). Test via WHERE ID = UNHEX(...). That will shrink the data about 25%, and MyISAM's PK by about 40%. (25% overall for InnoDB.)
To do just the conversion to BINARY(30):
CREATE TABLE new (
ID BINARY(30) NOT NULL,
`pair` char(60) NOT NULL
-- adding the PK later is faster for MyISAM
) ENGINE=MyISAM;
INSERT INTO new
SELECT UNHEX(ID),
pair
FROM keypairs;
ALTER TABLE keypairs ADD
PRIMARY KEY (`ID`); -- For InnoDB, I would do differently
RENAME TABLE keypairs TO old,
new TO keypairs;
DROP TABLE old;
Tiny RAM
With only 2GB of RAM, a MyISAM-only dataset should use something like key_buffer_size=300M and innodb_buffer_pool_size=0. For InnoDB-only: key_buffer_size=10M and innodb_buffer_pool_size=500M. Since ID is probably some kind of digest, it will be very random. The small cache and the random key combine to mean that virtually every insert will involve a disk I/O. My first estimate would be more like 30 hours to insert 10M rows. What kind of drives do you have? SSDs would make a big difference if you don't already have such.
The other thing to do to speed up the INSERTs is to sort by ID before starting the LOAD. But that gets tricky with the UNHEX. Here's what I recommend.
Create a MyISAM table, tmp, with ID BINARY(30) and pair, but no indexes. (Don't worry about key_buffer_size; it won't be used.)
LOAD the data into tmp.
ALTER TABLE tmp ORDER BY ID; This will sort the table. There is still no index. I think, without proof, that this will be a filesort, which is much faster that "repair by key buffer" for this case.
INSERT INTO keypairs SELECT * FROM tmp; This will maximize the caching by feeding rows to keypairs in ID order.
Again, I have carefully spelled out things so that it works well regardless of which Engine keypairs is. I expect step 3 or 4 to take the longest, but I don't know which.
Optimizing a table requires that you optimize for specific queries. You can't determine the best optimization strategy unless you have specific queries in mind. Any optimization improves one type of query at the expense of other types of queries.
For example, if your query is SELECT SUM(pair) FROM keypairs (a query that would have to scan the whole table anyway), partitioning won't help, and just adds overhead.
If we assume your typical query is inserting or selecting one keypair at a time by its primary key, then yes, partitioning can help a lot. It all depends on whether the optimizer can tell that your query will find its data in a narrow subset of partitions (ideally one partition).
Also make sure to tune MyISAM. There aren't many tuning options:
Allocate key_buffer_size as high as you can spare to cache your indexes. Though I haven't ever tried anything higher than about 10GB, and I can't guarantee that MyISAM key buffers are stable at 53GB (the size of your MYI file).
Pre-load the key buffers: https://dev.mysql.com/doc/refman/5.7/en/cache-index.html
Size read_buffer_size and read_rnd_buffer_size appropriately given the queries you run. I can't give a specific value here, you should test different values with your queries.
Size bulk_insert_buffer_size to something large if you want to speed up LOAD DATA INFILE. It's 8MB by default, I'd try at least 256MB. I haven't experimented with that setting, so I can't speak from experience.
I try not to use MyISAM at all. MySQL is definitely trying to deprecate its use.
...is there a mysql command to ALTER TABLE add INT ID increment column automatically?
Yes, see my answer to https://stackoverflow.com/a/251630/20860
First, your primary key is not incrementable.
Which means, roughly: at every insert the index have to be rebalanced.
No wonder it goes slowpoke at the table of such a size.
And such an engine...
So, to the second: what's the point of keeping that MyISAM old junk?
Like, for example, you don't mind to loose row or two (or -teen) in case of an accident? And etc, etc, etc, even setting aside that current MySQL maintainer (Oracle Corp) explicitly discourages usage of MyISAM.
So, here are possible solutions:
1) Switch to Inno;
2) If you can't surrender the char ID, then:
Add autoincrement numerical key and set it primary - then, index would be clustered and the cost of insert would drop significantly;
Turn your current key into secondary index;
3) In case you can - it's obvious

Innodb_Buffer_Pool_Size Performance

I need a real and clear definition about the innodb_buffer_pool_size. It is suppose to increase performance but I'm not sure how.
Operative System: Debian 8
Memory: 1468mb
Mysql 5.5
What i did for my test was to create a table with 750001 rows with the default innodb_buffer_pool_size=134217728 and I made it on 1min 17.01sec.
INSERT INTO luis_test3 SELECT * FROM luis_test;
Query OK, 750001 rows affected (1min 17.01 sec)
After that test, I changed innodb_buffer_pool_size=402653184 and made the insert to another table (empty table) and made it on
INSERT INTO luis_test2 SELECT * FROM luis_test;
Query OK, 750001 rows affected (2min 3.17 sec)
Seems it doubles the size and no performance was improve, maybe affected on the insert but I'm not sure where will I improve performance. By the way, i have to note that I restarted my service on each change.
I have a whole server with full MyIsam tables to Innodb, but I need to follow the right conventions to set up everything on Innodb. Actually I want to prepare it for make hot backups using Percona Xtrabackup (so MyIsam is not considered for my plan).
I copy my data table sizes:
table------------------test
luis_test 316.47
luis_test2 272.83
luis_test3 272.83
luis_test_myIsam 254.16
table_prueba 0.02
None of the tables are indexed.
There are many more aspects to tuning than what you have mentioned.
The bottom line is pretty simple:
Set innodb_buffer_pool_size to about 70% of available memory. (That is, after allowing for other applications.)
Set key_buffer_size to about 40M.
To change from MyISAM to InnoDB is optimally just ALTER TABLE t ENGINE=InnoDB.
However, there are a lot of things that may cause degraded performance, especially in the way of indexes. I have enumerated them and discussed them. Probably very few will be critical.
The ALTER TABLE does virtually the same things as CREATE TABLE ...; INSERT ... SELECT; RENAME TABLE ..., just a lot shorter to type.
If, after studying my blog, you see the need to change the indexes, etc, then doing all the steps in a single ALTER is optimal:
ALTER TABLE t ADD PRIMARY KEY(foo), MODIFY COLUMN ..., ... ENGINE=InnoDB;
Yes, you need to do one table at a time.
If you have a lot of large tables, it is best to set innodb_file_per_table=ON before doing the ALTER or INSERT...SELECT.
Why did one test take twice as long as the other? Again, there are lots of things going on, so I can't give you a simple answer. One deadly possibility: Was 402653184 so big that it led to swapping? How much RAM do you have? That number would probably be too big for a 2GB server.

MySQL Full text search extremely slow on a AWS RDS large instance

I have a table having 14 million rows and i am trying to perform a full text search on this table. The query for this is performing really slow, it is taking around 9 seconds for a simple binary AND query. The same stuff executes instantly on my private cluster. Size of this table is around 3.1 GB and it contains 14 million rows. Can someone explain this behavior of RDS instance?
SELECT count(*)
FROM table_name WHERE id=97
AND match(body) against ('+data +big' IN BOOLEAN MODE)
A high IO rate often indicates insufficient memory, or buffers too small. A 3GB table, including indexes, should fit entirely in memory of a (much-less-than) 500$-per-month dedicated server.
MySQL has many different buffers, and as many parameters to fiddle with. The following buffers are the most important, compare their sizes in the two environments:
If InnoDB: innodb_buffer_pool_size
If MyISAM: key_buffer_size and read_buffer_size
have you added FULLTEXT index on body column if not then try this one surely it will make a big difference
ALTER TABLE `table_name` ADD FULLTEXT INDEX `bodytext` (`body`);
Hope it helps
Try this
SELECT count(1)
FROM table_name WHERE id=97
AND match(body) against ('+data +big' IN BOOLEAN MODE)
This should speed it up a little since you dont have to count all columns just the rows.
Can you post the explain itself?
Since DB version, table, indexes and execution plans are the same, you need to compare machine/cluster configurations. Main points of comparison CPU power available, cores used in single transaction, storage read speed, memory size and read speed/frequency. I can see Amazon provides a variety of configurations, so maybe you private cluster is much more powerful, than Amazon RDS instance config.
To add to above, you can level the load between CPU, IO and Memory to increase throughput.
Using match() against() you perform your research across your entire 3GB fulltext index and there is no way to force another index in this case.
To speed up your query you need to make your fulltext index lighter so you can:
1 - clean all the useless characters and stopwords from your fulltext index
2 - create multiple fulltext indexes and peek the appropriate one
3 - change fulltext searches to LIKE clause and force an other index such as 'id'.
Try placing id in the text index and say:
match(BODY,ID) against (+big +data +97) and id=97
You might also look at sphinx which can be used with MySQL easily.

Adding a FULLTEXT index on existing db

I have a MySQL table that has 165,716 records (and counting). The table is 233,3 MB large. Now I want to add a FULLTEXT index to a column in that table. Is that possible, or is it going to be a problem?
Kind regards
It's possible if the table is running the MyISAM engine, but it's likely to take a long time to complete the initial indexing.
[edit]I misread the size - I thought it was 2.3GB, not 233MB! If that's the case, the indexing shouldn't take that long.

MySQL: add a field to a large table

i have a table with about 200,000 records. i want to add a field to it:
ALTER TABLE `table` ADD `param_21` BOOL NOT NULL COMMENT 'about the field' AFTER `param_20`
but it seems a very heavy query and it takes a very long time, even on my Quad amd PC with 4GB of RAM.
i am running under windows/xampp and phpMyAdmin.
does mysql have a business with every record when adding a field?
or can i change the query so it makes the change more quickly?
MySQL will, in almost all cases, rebuild the table during an ALTER**. This is because the row-based engines (i.e. all of them) HAVE to do this to retain the data in the right format for querying. It's also because there are many other changes you could make which would also require rebuilding the table (such as changing indexes, primary keys etc)
I don't know what engine you're using, but I will assume MyISAM. MyISAM copies the data file, making any necessary format changes - this is relatively quick and is not likely to take much longer than the IO hardware can get the old datafile in and the new on out to disc.
Rebuilding the indexes is really the killer. Depending on how you have it configured, MySQL will either: for each index, put the indexed columns into a filesort buffer (which may be in memory but is typically on disc), sort that using its filesort() function (which does a quicksort by recursively copying the data between two files, if it's too big for memory) and then build the entire index based on the sorted data.
If it can't do the filesort trick, it will just behave as if you did an INSERT on every row, and populate the index blocks with each row's data in turn. This is painfully slow and results in far from optimal indexes.
You can tell which it's doing by using SHOW PROCESSLIST during the process. "Repairing by filesort" is good, "Repairing with keycache" is bad.
All of this will use AT MOST one core, but will sometimes be IO bound as well (especially copying the data file).
** There are some exceptions, such as dropping secondary indexes on innodb plugin tables.
You add a NOT NULL column, the tuples need to be populated. So it will be slow...
This touches each of 200.000 records, as each record needs to be updated with a new bool value which is not going to be null.
So; yes it's an expensive query... There is nothing you can do to make it faster.