MySQL Huge table select performance - mysql

I currently have a table with 10 million rows and need to increase the performance drastically.
I have thought about dividing this 1 table into 20 smaller tables of 500k but I could not get an increase in performance.
I have created 4 indexes for 4 columns and converted all the columns to INT's and I have another column that is a bit.
my basic query is select primary from from mytable where column1 = int and bitcolumn = b'1', this still is very slow, is there anything I can do to increase the performance?
Server Spec
32GB Memory, 2TB storage, and using the standard ini file, also my processor is AMD Phenom II X6 1090T

In addition to giving the mysql server more memory to play with, remove unnecessary indexes and make sure you have index on column1 (in your case). Add a limit clause to the sql if possible.

Download this (on your server):
MySQLTuner.pl
Install it, run it and see what it says - even better paste the output here.

There is not enough information to reliably diagnose the issue, but you state that you're using "the default" my.cnf / my.ini file on a system with 32G of memory.
From the MySQL Documentation the following pre-configured files are shipped:
Small: System has <64MB memory, and MySQL is not used often.
Medium: System has at least 64MB memory
Large: System has at least 512MB memory and the server will run mainly MySQL.
Huge: System has at least 1GB memory and the server will run mainly MySQL.
Heavy: System has at least 4GB memory and the server will run mainly MySQL.
Best case, you're using a configuration file that utilizes 1/8th of the memory on your system (if you are using the "Heavy" file, which as far as I recall is not the default one. I think the default one is Medium or perhaps Large).
I suggest editing your my.cnf file appropriately.
There several areas of MySQL for which the memory allocation can be tweaked to maximize performance for your particular case. You can post your my.cnf / my.ini file here for more specific advice. You can also use MySQL Tuner to get some automated advice.

I made something that make a big difference in the query time
but it is may not useful for all cases, just in my case
I have a huge table (about 2,350,000 records), but I can expect the exact place that I should play with
so I added this condition WHERE id > '2300000' as I said this is my case, but it may help others
so the full query will be:
SELECT primary from mytable where id > '2300000' AND column1 = int AND bitcolumn = b'1'
The query time was 2~3 seconds and not it is less than 0.01

First of all, your query
select primary from from mytable where column1 = int and bitcolumn = b'1'
has some errors, like two from clauses. Second thing, splitting the table and using an unnecessary index never helps in performance. Some tips to follow are:
1) Use a composite index if you repeatedly query some columns together. But precautions must be taken, because in a composite index the order of placing a column in the index matters a lot.
2) The primary key is more helpful if it's on int column.
3) Read some articles on indices and optimization, they are so many, search on Google.

Related

MySQL Full text search extremely slow on a AWS RDS large instance

I have a table having 14 million rows and i am trying to perform a full text search on this table. The query for this is performing really slow, it is taking around 9 seconds for a simple binary AND query. The same stuff executes instantly on my private cluster. Size of this table is around 3.1 GB and it contains 14 million rows. Can someone explain this behavior of RDS instance?
SELECT count(*)
FROM table_name WHERE id=97
AND match(body) against ('+data +big' IN BOOLEAN MODE)
A high IO rate often indicates insufficient memory, or buffers too small. A 3GB table, including indexes, should fit entirely in memory of a (much-less-than) 500$-per-month dedicated server.
MySQL has many different buffers, and as many parameters to fiddle with. The following buffers are the most important, compare their sizes in the two environments:
If InnoDB: innodb_buffer_pool_size
If MyISAM: key_buffer_size and read_buffer_size
have you added FULLTEXT index on body column if not then try this one surely it will make a big difference
ALTER TABLE `table_name` ADD FULLTEXT INDEX `bodytext` (`body`);
Hope it helps
Try this
SELECT count(1)
FROM table_name WHERE id=97
AND match(body) against ('+data +big' IN BOOLEAN MODE)
This should speed it up a little since you dont have to count all columns just the rows.
Can you post the explain itself?
Since DB version, table, indexes and execution plans are the same, you need to compare machine/cluster configurations. Main points of comparison CPU power available, cores used in single transaction, storage read speed, memory size and read speed/frequency. I can see Amazon provides a variety of configurations, so maybe you private cluster is much more powerful, than Amazon RDS instance config.
To add to above, you can level the load between CPU, IO and Memory to increase throughput.
Using match() against() you perform your research across your entire 3GB fulltext index and there is no way to force another index in this case.
To speed up your query you need to make your fulltext index lighter so you can:
1 - clean all the useless characters and stopwords from your fulltext index
2 - create multiple fulltext indexes and peek the appropriate one
3 - change fulltext searches to LIKE clause and force an other index such as 'id'.
Try placing id in the text index and say:
match(BODY,ID) against (+big +data +97) and id=97
You might also look at sphinx which can be used with MySQL easily.

mysql speed, table index and select/update/insert

We have got a MySQL table which has got more than 7.000.000 (yes seven million) rows.
We are always doing so much SELECT / INSERT / UPDATE queries per 5 seconds.
Is it a good thing that if we create MySQL INDEX for that table? Will there be some bad consequences like data corrupting or loosing MySQL services etc.?
Little info:
MySQL version 5.1.56
Server CentOS
Table engines are MyISAM
MySQL CPU load between 200% - 400% always
In general, indexes will improve the speed of SELECT operations and will slow down INSERT/UPDATE/DELETE operations, as both the base table and the indexes must be modified when a change occurs.
It is very difficult to say such a thing. I would expect that the indexing itself might take some time. But after that you should have some improvements. As said by #Joe and #Patrick, it might hurt your modification time, but the selecting will be faster.
Ofcourse, there are some other ways of improving performance on inserting and updating. You could ofcourse batch updates if it is not important to have change visible immediatly.
The indexes will help dramatically with selects. Especially if the match up well with the commonly filtered fields. And you have a good simple primary key. They will help with both the time of the queries and the processing cycles.
The drawbacks are if you are very often updating/altering/deleting these records, especially the indexed fields. Even in this case though, it is often worth it.
How much you're going to be reporting (select statement) vs updating (should!) hugely affects both your initial design as well as your later adjustments once your db is in the wild. Since you already have what you have, testing will give you the answers you need. If you really do a lot of select queries, and a lot of updating, your solution might be to copy out data now and then to a reporting table. Then you can index like crazy with no ill effects.
You have actually asked a large question, and you should study up on this more. The general things I've mentioned above hold for most all relational dbs, but there are also particular behaviors of the particular databases (MySQL in your case), mainly in how they decide when and where to use indexes.
If you are looking for performance, indexes are the way to go. Indexes speed up your queries. If you have 7 Million records, your queries are probably taking many seconds possibley a minute depending on your memory size.
Generally speaking, I would create indexes that match the most frequent SELECT statements. Everyone talks about the negative impact of indexes on table size and speed but I would neglect those impacts unless you have a table for which you are doing 95% of the time inserts and updates but even then, if those inserts happen at night and you query during the day, go and create those indexes, your users during daytime will appreciate it.
What is the actual time impact to an insert or update statement if there is an additional index, 0.001 secondes maybe? If the index saves you many seconds per each query, I guess the additional time required to update index is well worth it.
The only time I ever had an issue with creating an index (it actually broke the program logic) was when we added a primary key to a table that was previously created (by someone else) without a primary key and the program was expecting that the SELECT statement returns the records in the sequence they were created. Creating the primary key changed that, the records when selecting without any WHERE clause were returned in a different sequence.
This is obviously a wrong design in the first place, nevertheless, if you have an older program and you encounter tables without primary key, I suggest to look at the code that reads that table before adding a primary key, just in case.
One more last thought about creating indexes, the choice of fields and the sequence in which the fields appear in the index have an impact on the performance of the index.
I had the same kind of problem that you describe.
I did a few changes and 1 query passed from 11sec to a few milliseconds
1- Upgraded to MariaDB 10.1
2- Changed ALL my DB to ARIA engine
3- Changed my.cnf to the strict mininum
4- Upgraded php 7.1 (but this one had a little impact)
5- with CentOS : "Yum update" in the terminal or via ssh (by keeping everything up to date)
1- MariaDB is the new Open source version of MYSQL
2- ARIA engine is the evolution of MYISAM
3- my.cnf have usually too much change that affect performance
Here an example
[mysqld]
performance-schema=1
general_log=0
slow_query_log=0
max_allowed_packet=268435456
By removing all extra options from the my.cnf, it's telling mysql to use default values.
In MYSQL 5 (5.1, 5.5, 5.6...) When I did that ; I only noticed a small difference.
But in MariaDB -> the small my.cnf like this did a BIG difference.
******* ALL of those changes ; the server hardware remained the same.
Hope it can help you

Which is faster, key_cache or OS cache?

In a tb with 1 mil. rows if I do (after I restart the computer - so nothing it's cached):
1. SELECT price,city,state FROM tb1 WHERE zipId=13458;
the result is 23rows in 0.270s
after I run 'LOAD INDEX INTO CACHE tb1' (key_buffer_size=128M and total index size for tb is 82M):
2. SELECT price,city,state FROM tb1 WHERE zipId=24781;
the result is 23rows in 0.252s, Key_reads remains constant, Key_read_requests is incremented with 23
BUT after I load 'zipId' into OS cache, if I run again the query:
2. SELECT price,city,state FROM tb1 WHERE zipId=20548;
the result is 22rows in 0.006s
This it's just a simple example, but I run tens of tests and combinations. But the results are always the same.
I use: MySql with MyISAM, WINDOWS 7 64, and the query_cache is 0;
zipId it's a regular index (not primary key)
SHOULDN'T key_cache be faster than OS cache ??
SHOULDN'T be a huge difference in speed, after I load the index into cache ??
(in my test it's almost no difference).
I've read a lot of websites,tutorials and blogs on this matter but none of them really discuss the difference in speed. So, any ideas or links will be greatly appreciated.
Thank you.
Under normal query processing, MySQL will scan the index for the where clause values (i.e. zipId = 13458). Then uses the index to look up the corresponding values from the MyISAM main table (a second disk access). When you load the table into memory, the disk accesses are all done in memory, not from reading a real disk.
The slow part of the query is the lookup from the index into the main table. So loading the index into memory may not improve the query speed.
One thing to try is Explain Select on your queries to see how the index is being used.
Edit: Since I don't think the answers to your comments will fit in a comment space. I'll answer them here.
MyISAM in and of itself does not have a cache. It relies upon the OS to do the disk caching. How much of your table is cached by depends upon what else you are running in the system, and how much data you are reading through. Windows in particular does not allow the user much control over what data is cached and for how long.
The OS caches disk blocks (either 4K or 8K chunks) of the index file or the full table file.
SELECT indexed_col FROM tb1 WHERE zipId+0>1
Queries like this where you use functions on the predicate (Where clause) can cause MySQL to do full table scans rather than using any index. As I suggested above, use EXPLAIN SELECT to see what MySQL is doing.
If you want more control over the cache, try using an INNODB table. The InnoDB engine creates its own cache which you can size, and does a better job of keeping the most recent used stuff in it.

Insertion speed slowdown as the table grows in mysql

I am trying to get a better understanding about insertion speed and performance patterns in mysql for a custom product. I have two tables to which I keep appending new rows. The two tables are defined as follows:
CREATE TABLE events (
added_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
id BINARY(16) NOT NULL,
body MEDIUMBLOB,
UNIQUE KEY (id)) ENGINE InnoDB;
CREATE TABLE index_fpid (
fpid VARCHAR(255) NOT NULL,
event_id BINARY(16) NOT NULL UNIQUE,
PRIMARY KEY (fpid, event_id)) ENGINE InnoDB;
And I keep inserting new objects to both tables (for each new object, I insert the relevant information to both tables in one transaction). At first, I get around 600 insertions / sec, but after ~ 30000 rows, I get a significant slowdown (around 200 insertions/sec), and then a more slower, but still noticeable slowdown.
I can see that as the table grows, the IO wait numbers get higher and higher. My first thought was memory taken by the index, but those are done on a VM which has 768 Mb, and is dedicated to this task alone (2/3 of memory are unused). Also, I have a hard time seeing 30000 rows taking so much memory, even more so just the indexes (the whole mysql data dir < 100 Mb anyway). To confirm this, I allocated very little memory to the VM (64 Mb), and the slowdown pattern is almost identical (i.e. slowdown appears after the same numbers of insertions), so I suspect some configuration issues, especially since I am relatively new to databases.
The pattern looks as follows:
I have a self-contained python script which reproduces the issue, that I can make available if that's helpful.
Configuration:
Ubuntu 10.04, 32 bits running on KVM, 760 Mb allocated to it.
Mysql 5.1, out of the box configuration with separate files for tables
[EDIT]
Thank you very much to Eric Holmberg, he nailed it. Here are the graphs after fixing the innodb_buffer_pool_size to a reasonable value:
Edit your /etc/mysql/my.cnf file and make sure you allocate enough memory to the InnoDB buffer pool. If this is a dedicated sever, you could probably use up to 80% of your system memory.
# Provide a buffer pool for InnoDB - up to 80% of memory for a dedicated database server
innodb_buffer_pool_size=614M
The primary keys are B Trees so inserts will always take O(logN) time and once you run out of cache, they will start swapping like mad. When this happens, you will probably want to partition the data to keep your insertion speed up. See http://dev.mysql.com/doc/refman/5.1/en/partitioning.html for more info on partitioning.
Good luck!
Your indexes may just need to be analyzed and optimized during the insert, they gradually get out of shape as you go along. The other option of course is to disable indexes entirely when you're inserting and rebuild them later which should give more consistent performance.
Great link about insert speed.
ANALYZE. OPTIMIZE
Verifying that the insert doesn't violate a key constraint takes some time, and that time grows as the table gets larger. If you're interested in flat out performance, using LOAD DATA INFILE will improve your insert speed considerably.

MySQL ALTER TABLE on very large table - is it safe to run it?

I have a MySQL database with a MyISAM table with 4 million rows. I update this table about once a week with about 2000 new rows. After updating, I then alter the table like this:
ALTER TABLE x ORDER BY PK DESC
I order the table by the primary key field in descending order. This has not given me any problems on my development machine (Windows with 3GB memory). Three times I have tried it successfully on the production Linux server (with 512MB RAM - and achieving the resulted sorted table in about 6 minutes each time), the last time I tried it I had to stop the query after about 30 minutes and rebuild the database from a backup.
Can a 512MB server cope with that alter statement on such a large table? I have read that a temporary table is created to perform the ALTER TABLE command.
Question: Can this alter command be safely run? What should be the expected time for the alteration of the table?
As I have just read, the ALTER TABLE ... ORDER BY ... query is useful to improve performance in certain scenarios. I am surprised that the PK Index does not help with this. But, from the MySQL docs, it seems that InnoDB does use the index. However InnoDB tends to be slower as MyISAM. That said, with InnoDB you wouldn't need to re-order the table but you would lose the blazing speed of MyISAM. It still may be worth a shot.
The way you explain the problems, it seems that there is too much data loaded into memory (maybe there is even swapping going on?). You could easily check that with monitoring your memory usage. It's hard to say as I do not know MySQL all that well.
On the other hand, I think your problem lies at a very different place: You are using a machine with only 512 Megs of RAM as Database server with a table containing more than 4Mio rows... And you are performing a very memory-heavy operation on the whole table on that machine. It seems that 512Megs will not nearly be enough for that.
A much more fundamental issue I am seeing here: You are doing development (and quite likely testing as well) in an environment that is very different to the production environment. The kind of problem you are explaining is to be expected. Your development machine has six times as much memory as your production machine. I believe I can safely say, that the processor is much faster as well. In that case, I suggest you create a virtual machine mimicking your production site. That way you can easily test your project without disrupting the production site.
What you're asking it to do is rebuild the entire table and all its indexes; this is an expensive operation particularly if the data doesn't fit in ram. It will complete, but it will be vastly slower if the data doesn't fit in ram, particularly if you have lots of indexes.
I question your judgement when choosing to run a machine with such tiny memory in production. Anyway:
Is this ALTER TABLE really necessary; what specific query are you trying to speed up, and have you tried it without?
Have you considered making your development machine more like production? I mean, using a dev box with MORE memory is never a good idea, and using a different OS is definitely not either.
There is probably also some tuning you can do to try to help; it largely depends on your schema (indexes in particular). 4M rows is not very many (for a machine with normal amounts of ram).
is the primary key auto_increment? if so, then doing ALTER TABLE ... ORDER BY isn't going to improve anything since everything will be inserted in order anyway.
(unless you have lots of deletes)
I'd probably create a View instead which is ordered by the PK value, so that for one thing you don't need to lock up that huge table while the ALTER is being performed.
If you're using InnoDB, you shouldn't have to explicitly perform the ORDER BY either post-insert or at query time. According to the MySQL 5.0 manual, InnoDB already defaults to primary key ordering for query results:
http://dev.mysql.com/doc/refman/5.0/en/alter-table.html#id4052480
MyISAM tables return records in insertion order by default, instead, which may work as well if you only ever append to the table, rather than using an UPDATE query to modify any rows in-place.