MySQL Full text search extremely slow on a AWS RDS large instance - mysql

I have a table having 14 million rows and i am trying to perform a full text search on this table. The query for this is performing really slow, it is taking around 9 seconds for a simple binary AND query. The same stuff executes instantly on my private cluster. Size of this table is around 3.1 GB and it contains 14 million rows. Can someone explain this behavior of RDS instance?
SELECT count(*)
FROM table_name WHERE id=97
AND match(body) against ('+data +big' IN BOOLEAN MODE)

A high IO rate often indicates insufficient memory, or buffers too small. A 3GB table, including indexes, should fit entirely in memory of a (much-less-than) 500$-per-month dedicated server.
MySQL has many different buffers, and as many parameters to fiddle with. The following buffers are the most important, compare their sizes in the two environments:
If InnoDB: innodb_buffer_pool_size
If MyISAM: key_buffer_size and read_buffer_size

have you added FULLTEXT index on body column if not then try this one surely it will make a big difference
ALTER TABLE `table_name` ADD FULLTEXT INDEX `bodytext` (`body`);
Hope it helps

Try this
SELECT count(1)
FROM table_name WHERE id=97
AND match(body) against ('+data +big' IN BOOLEAN MODE)
This should speed it up a little since you dont have to count all columns just the rows.
Can you post the explain itself?

Since DB version, table, indexes and execution plans are the same, you need to compare machine/cluster configurations. Main points of comparison CPU power available, cores used in single transaction, storage read speed, memory size and read speed/frequency. I can see Amazon provides a variety of configurations, so maybe you private cluster is much more powerful, than Amazon RDS instance config.
To add to above, you can level the load between CPU, IO and Memory to increase throughput.

Using match() against() you perform your research across your entire 3GB fulltext index and there is no way to force another index in this case.
To speed up your query you need to make your fulltext index lighter so you can:
1 - clean all the useless characters and stopwords from your fulltext index
2 - create multiple fulltext indexes and peek the appropriate one
3 - change fulltext searches to LIKE clause and force an other index such as 'id'.

Try placing id in the text index and say:
match(BODY,ID) against (+big +data +97) and id=97
You might also look at sphinx which can be used with MySQL easily.

Related

RDS InnoDB RAM usage issue

I'm using AWS RDS for a long time on a production environment.
I started monitoring it's memory usage especially InnoDB stats.
Almost whole buffer pool is full but i know that indexes created by me are not so big. Database have 32GB of RAM. MySQL version 5.7.22
After further digging i have spotted huge number of pages being used by CLUST_IND index in SYS_TABLES table and pages of type "Unknown". I am wondering if there is anything that can be done to clean it up? Any advise would be appreciated.
The query:
select
table_name as Table_Name, index_name as Index_Name,
count(*) as Page_Count, sum(data_size)/1024/1024 as Size_in_MB
from information_schema.innodb_buffer_page
group by table_name, index_name
order by Size_in_MB desc;
and result:
information_schema.innodb_sys_tables is an internal list of all the tables in the system. Do not mess with it. Do not worry about its space usage. "CLUST_IND" is an artificial Primary Key.
At least in the case of MySQL 5.6, it does not occupy disk space; it is in RAM. And it does not actually take 4093MB. That computation is bogus for certain system tables, such as this.
Do you have thousands of tables? If so, consider whether you need all of them. Each one is taking a little space in sys_tables.
The buffer_pool is a cache, so it is normal for it to be nearly full most of the time. How full it is is irrelevant.

Creating an index taking ages

I have this database with three columns id, hash and name. I have to query this table using where clause on the hash value which is a string. I have already loaded 100 million entries in it. I am trying to create an index using the following command
CREATE INDEX i ON table (sequence) using HASH
How much time will it take and how much speed up I will gain after having this index. I am using mysql by the way
Indexing will definitely help.
For better performance on my MYISAM you should take a look at optimizing these variables:
key_buffer_size, myisam_max_sort_file_size and
myisam_sort_buffer_size. key_buffer_size
I would recommend using Innodb though.

Performance difference between Innodb and Myisam in Mysql

I have a mysql table with over 30 million records that was originally being stored with myisam. Here is a description of the table:
I would run the following query against this table which would generally take around 30 seconds to complete. I would change #eid each time to avoid database or disk caching.
select count(fact_data.id)
from fact_data
where fact_data.entity_id=#eid
and fact_data.metric_id=1
I then converted this table to innoDB without making any other changes and afterwards the same query now returns in under a second every single time I run the query. Even when I randomly set #eid to avoid caching, the query returns in under a second.
I've been researching the differences between the two storage types to try to explain the dramatic improvement in performance but haven't been able to come up with anything. In fact, much of what I read indicates that Myisam should be faster.
The queries I'm running are against a local database with no other processes hitting the database at the time of the tests.
That's a surprisingly large performance difference, but I can think of a few things that may be contributing.
MyISAM has historically been viewed as faster than InnoDB, but for recent versions of InnoDB, that is true for a much, much smaller set of use cases. MyISAM is typically faster for table scans of read-only tables. In most other use cases, I typically find InnoDB to be faster. Often many times faster. Table locks are a death knell for MyISAM in most of my usage of MySQL.
MyISAM caches indexes in its key buffer. Perhaps you have set the key buffer too small for it to effectively cache the index for your somewhat large table.
MyISAM depends on the OS to cache table data from the .MYD files in the OS disk cache. If the OS is running low on memory, it will start dumping its disk cache. That could force it to keep reading from disk.
InnoDB caches both indexes and data in its own memory buffer. You can tell the OS not to also use its disk cache if you set innodb_flush_method to O_DIRECT, though this isn't supported on OS X.
InnoDB usually buffers data and indexes in 16kb pages. Depending on how you are changing the value of #eid between queries, it may have already cached the data for one query due to the disk reads from a previous query.
Make sure you created the indexes identically. Use explain to check if MySQL is using the index. Since you included the output of describe instead of show create table or show indexes from, I can't tell if entity_id is part of a composite index. If it was not the first part of a composite index, it wouldn't be used.
If you are using a relatively modern version of MySQL, run the following command before running the query:
set profiling = 1;
That will turn on query profiling for your session. After running the query, run
show profiles;
That will show you the list of queries for which profiles are available. I think it keeps the last 20 by default. Assuming your query was the first one, run:
show profile for query 1;
You will then see the duration of each stage in running your query. This is extremely useful for determining what (e.g., table locks, sorting, creating temp tables, etc.) is causing a query to be slow.
My first suspicion would be that the original MyISAM table and/or indexes became fragmented over time resulting in the performance slowly degrading. The InnoDB table would not have the same problem since you created it with all the data already in it (so it would all be stored sequentially on disk).
You could test this theory by rebuilding the MyISAM table. The easiest way to do this would be to use a "null" ALTER TABLE statement:
ALTER TABLE mytable ENGINE = MyISAM;
Then check the performance to see if it is better.
Another possibility would be if the database itself is simply tuned for InnoDB performance rather than MyISAM. For example, InnoDB uses the innodb_buffer_pool_size parameter to know how much memory should be allocated for storing cached data and indexes in memory. But MyISAM uses the key_buffer parameter. If your database has a large innodb buffer pool and a small key buffer, then InnoDB performance is going to be better than MyISAM performance, especially for large tables.
What are your index definitions, there are ways in which you can create indexes for MyISAM in which your index fields will not be used when you think they would.

MySQL Huge table select performance

I currently have a table with 10 million rows and need to increase the performance drastically.
I have thought about dividing this 1 table into 20 smaller tables of 500k but I could not get an increase in performance.
I have created 4 indexes for 4 columns and converted all the columns to INT's and I have another column that is a bit.
my basic query is select primary from from mytable where column1 = int and bitcolumn = b'1', this still is very slow, is there anything I can do to increase the performance?
Server Spec
32GB Memory, 2TB storage, and using the standard ini file, also my processor is AMD Phenom II X6 1090T
In addition to giving the mysql server more memory to play with, remove unnecessary indexes and make sure you have index on column1 (in your case). Add a limit clause to the sql if possible.
Download this (on your server):
MySQLTuner.pl
Install it, run it and see what it says - even better paste the output here.
There is not enough information to reliably diagnose the issue, but you state that you're using "the default" my.cnf / my.ini file on a system with 32G of memory.
From the MySQL Documentation the following pre-configured files are shipped:
Small: System has <64MB memory, and MySQL is not used often.
Medium: System has at least 64MB memory
Large: System has at least 512MB memory and the server will run mainly MySQL.
Huge: System has at least 1GB memory and the server will run mainly MySQL.
Heavy: System has at least 4GB memory and the server will run mainly MySQL.
Best case, you're using a configuration file that utilizes 1/8th of the memory on your system (if you are using the "Heavy" file, which as far as I recall is not the default one. I think the default one is Medium or perhaps Large).
I suggest editing your my.cnf file appropriately.
There several areas of MySQL for which the memory allocation can be tweaked to maximize performance for your particular case. You can post your my.cnf / my.ini file here for more specific advice. You can also use MySQL Tuner to get some automated advice.
I made something that make a big difference in the query time
but it is may not useful for all cases, just in my case
I have a huge table (about 2,350,000 records), but I can expect the exact place that I should play with
so I added this condition WHERE id > '2300000' as I said this is my case, but it may help others
so the full query will be:
SELECT primary from mytable where id > '2300000' AND column1 = int AND bitcolumn = b'1'
The query time was 2~3 seconds and not it is less than 0.01
First of all, your query
select primary from from mytable where column1 = int and bitcolumn = b'1'
has some errors, like two from clauses. Second thing, splitting the table and using an unnecessary index never helps in performance. Some tips to follow are:
1) Use a composite index if you repeatedly query some columns together. But precautions must be taken, because in a composite index the order of placing a column in the index matters a lot.
2) The primary key is more helpful if it's on int column.
3) Read some articles on indices and optimization, they are so many, search on Google.

Which is faster, key_cache or OS cache?

In a tb with 1 mil. rows if I do (after I restart the computer - so nothing it's cached):
1. SELECT price,city,state FROM tb1 WHERE zipId=13458;
the result is 23rows in 0.270s
after I run 'LOAD INDEX INTO CACHE tb1' (key_buffer_size=128M and total index size for tb is 82M):
2. SELECT price,city,state FROM tb1 WHERE zipId=24781;
the result is 23rows in 0.252s, Key_reads remains constant, Key_read_requests is incremented with 23
BUT after I load 'zipId' into OS cache, if I run again the query:
2. SELECT price,city,state FROM tb1 WHERE zipId=20548;
the result is 22rows in 0.006s
This it's just a simple example, but I run tens of tests and combinations. But the results are always the same.
I use: MySql with MyISAM, WINDOWS 7 64, and the query_cache is 0;
zipId it's a regular index (not primary key)
SHOULDN'T key_cache be faster than OS cache ??
SHOULDN'T be a huge difference in speed, after I load the index into cache ??
(in my test it's almost no difference).
I've read a lot of websites,tutorials and blogs on this matter but none of them really discuss the difference in speed. So, any ideas or links will be greatly appreciated.
Thank you.
Under normal query processing, MySQL will scan the index for the where clause values (i.e. zipId = 13458). Then uses the index to look up the corresponding values from the MyISAM main table (a second disk access). When you load the table into memory, the disk accesses are all done in memory, not from reading a real disk.
The slow part of the query is the lookup from the index into the main table. So loading the index into memory may not improve the query speed.
One thing to try is Explain Select on your queries to see how the index is being used.
Edit: Since I don't think the answers to your comments will fit in a comment space. I'll answer them here.
MyISAM in and of itself does not have a cache. It relies upon the OS to do the disk caching. How much of your table is cached by depends upon what else you are running in the system, and how much data you are reading through. Windows in particular does not allow the user much control over what data is cached and for how long.
The OS caches disk blocks (either 4K or 8K chunks) of the index file or the full table file.
SELECT indexed_col FROM tb1 WHERE zipId+0>1
Queries like this where you use functions on the predicate (Where clause) can cause MySQL to do full table scans rather than using any index. As I suggested above, use EXPLAIN SELECT to see what MySQL is doing.
If you want more control over the cache, try using an INNODB table. The InnoDB engine creates its own cache which you can size, and does a better job of keeping the most recent used stuff in it.