MySQL Cluster is much slower than InnoDB - mysql

I have a denormalized table product with about 6 million rows (~ 2GB) mainly for lookups. Fields include price, color, unitprice, weight, ...
I have BTREE indexes on color etc. Query conditions are dynamically generated from the Web, such as
select count(*)
from product
where color = 1 and price > 5 and price < 100 and weight > 30 ... etc
and
select *
from product
where color = 2 and price > 35 and unitprice < 110
order by weight
limit 25;
I used to use InnoDB and tried MEMORY tables, and switched to NDB hoping more concurrent queries can be done faster. I have 2 tables with the same schema, indexes, and data. One is InnoDB while the other is NDB. But the results are very disappointing:for the queries mentioned above, InnoDB is like 50 times faster than NDB. It's like 0.8 seocond vs 40 seconds. For this test I was running only a single select query repeatedbly. Both InnoDB and NDB queries are using the same index on color.
I am using mysql-5.1.47 ndb-7.1.5 on a dual Xeon 5506 (8 cores total), 32GB memory running CentOS 5. I set up 2 NDB Data nodes, one MGM node and one MYSQL node on the same box. For each node I allocated like 9GB memory, and also tried MaxNoOfExecutionThreads=8, LockPagesInMainMemory, LockExecuteThreadToCPU and many other config parameters, but no luck. While NDB is running the query, my peak CPU load was only like 200%, i.e., only 2 out of 8 cores were busy. Most of the time it was like 100%. I was using ndbmtd, and verified in the data node log and the LQH threads were indeed spawned.
I also tried explain, profiling -- it just showing that Sending data was consuming most of the time. I also went thru some Mysql Cluster tuning documents available online, not very helpful in my case.
Anybody can shed some light on this? Is there any better way to tune an NDB database? Appreciate it!

You need to pick the right storage engine for your application.
myISAM -- read frequently / write infrequently. Ideal for data lookups in big tables. Does reasonably well with complex indexes and is quite good for batch reloads.
MEMORY -- good for fast access to relatively small and simple tables.
InnoDB -- good for transaction processing. Also good for a mixed read / write workload.
NDB -- relatively less mature. Good for fault tolerance.
The mySQL server is not inherently multiprocessor software. So adding cores isn't necessarily going to jack up performance. A good host for mySQL is a decent two-core system with plenty of RAM and the fastest disk IO channels and disks you can afford. Do NOT put your mySQL data files on a networked or shared file system, unless you don't care about query performance.
If you're running on Linux issue these two commands (on the machine running the mySQL server) to see whether you're burning all your cpu, or burning all your disk IO:
sar -u 1 10
sar -d 1 10
Your application sounds like a candidate for myISAM. It sounds like you have plenty of hardware. In that case you can build a master server and an automatically replicated slave server But you may be fine with just one server. This will be easier to maintain.
Edit It's eight years latar and this answer is now basically obsolete.

Related

AWS RDS MySQL performance

I am running MySQL 8 on AWS RDS, I have an InnoDB type table with 260,000 rows inside, no extraordinary data size.
My development server features 1GB RAM, 1vCPU, and my AWS RDS server is t3.small.
SELECT Count operations take too long (average 33 seconds) and my data tables in my Laravel project time out, what could be the problem?
select count('special_cargo_id') from special_cargos
33seconds
Is special_cargo_id your PK, what is it's type, and does it fit into innodb_buffer_pool_size?
Run:
select count(1) from special_cargos;
a few times. Does it run quickly after the first time? If it does, then the reason it slows down sometimes is because you are memory starved and other data pushes your PK on that table out of the innodb_buffer_pool. If it is always slow, the PK most likely never fits into the buffer pool.
If you're trying to debug performance of your Database, RDS has a great built in tool for that.
With RDS Performance Insights you should be able to identify where the bottleneck is.

Optimize configuration of MySQL for a single user with complex queries

I would like to optimize MySQL for a single user (me). Noone else is connected to the database. MySQL is installed locally on my PC. The data covers around 70 tables with 150 GB of data in total.
The data is static, so I won't do changes to it (no UPDATE or DELETE). I want to analyze the data, so I will run large and complex queries including SELECT and JOIN over large proportion of the data.
Machine:
Windows 7 64-bit
Intel Core i7-4800MQ # 2,70 GHz
32 GB RAM
2x 512 GB SSD
MySQL 5.7, INNODB
What I did so far:
Deactivated HyperThreading (MySQL uses only one virtual Core per
query - CPU usage 12.5% --> 25%)
Declare primary keys and
Indexed all foreign keys
innodb_buffer_pool_size = 25G
max_connections=10
Use of InnoDB
So what can I do to optimize the configuration (not the query itself) for single-user-queries?
database engine?
general configuration?
better cache all the information of the joins?
Note:
Under the current configuration, the CPU usage is the bottleneck because it is on 25% when I run a complex query. As test i tried some huge queries (fetching a lot of data). If I can believe the timing of MYSQL Workbench the duration was only 4 seconds, but after 10 hours of running it couldn't finish fetching the data...
Thanks a lot!
You could try MySiam. It is especially good for read-intensive (select) tables. And you should select a large key_buffer_size. But I am not sure about the JOIN performance... Give it a try

IOPS or Throughput? - Determining Write Bottleneck in Amazon RDS Instance

We have nightly load jobs that writes several hundred thousand records to an Mysql reporting database running in Amazon RDS.
The load jobs are taking several hours to complete, but I am having a hard time figuring out where the bottleneck is.
The instance is currently running with General Purpose (SSD) storage. By looking at the cloudwatch metrics, it appears I am averaging less than 50 IOPS for the last week. However, Network Receive Throughput is less than 0.2 MB/sec.
Is there anyway to tell from this data if I am being bottlenecked by network latency (we are currently loading the data from a remote server...this will change eventually) or by Write IOPS?
If IOPS is the bottleneck, I can easily upgrade to Provisioned IOPS. But if network latency is the issue, I will need to redesign our load jobs to load raw data from EC2 instances instead of our remote servers, which will take some time to implement.
Any advice is appreciated.
UPDATE:
More info about my instance. I am using an m3.xlarge instance. It is provisioned for 500GB in size. The load jobs are done with the ETL tool from pentaho. They pull from multiple (remote) source databases and insert into the RDS instance using multiple threads.
You aren't using up much CPU. Your memory is very low. An instance with more memory should be a good win.
You're only doing 50-150 iops. That's low, you should get 3000 in a burst on standard SSD-level storage. However, if your database is small, it is probably hurting you (since you get 3 iops per GB- so if you are on a 50gb or smaller database, consider paying for provisioned iops).
You might also try Aurora; it speaks mysql, and supposedly has great performance.
If you can spread out your writes, the spikes will be smaller.
A very quick test is to buy provisioned IOPS, but be careful as you may get fewer than you do currently during a burst.
Another quick means to determine your bottleneck is to profile your job execution application with a profiler that understands your database driver. If you're using Java, JProfiler will show the characteristics of your job and it's use of the database.
A third is to configure your database driver to print statistics about the database workload. This might inform you that you are issuing far more queries than you would expect.
Your most likely culprit accessing the database remotely is actually round-trip latency. The impact is easy to overlook or underestimate.
If the remote database has, for example, a 75 millisecond round-trip time, you can't possibly execute more than 1000 (milliseconds/sec) / 75 (milliseconds/round trip) = 13.3 queries per second if you're using a single connection. There's no getting around the laws of physics.
The spikes suggest inefficiency in the loading process, where it gathers for a while, then loads for a while, then gathers for a while, then loads for a while.
Separate but related, if you don't have the MySQL client/server compression protocol enabled on the client side... find out how to enable it. (The server always supports compression but the client has to request it during the initial connection handshake), This won't fix the core problem but should improve the situation somewhat, since less data to physically transfer could mean less time wasted in transit.
I'm not an RDS expert and I don't know if my own particular case can shed you some light. Anyway, hope this give you any kind of insight.
I have a db.t1.micro with 200GB provisioned (that gives be 600 IOPS baseline performance), on a General Purpose SSD storage.
The heaviest workload is when I aggregate thousands of records from a pool of around 2.5 million rows from a 10 million rows table and another one of 8 million rows. I do this every day. This is what I average (it is steady performance, unlike yours where I see a pattern of spikes):
Write/ReadIOPS: +600 IOPS
NetworkTrafficReceived/Transmit throughput: < 3,000 Bytes/sec (my queries are relatively short)
Database connections: 15 (workers aggregating on parallel)
Queue depth: 7.5 counts
Read/Write Throughput: 10MB per second
The whole aggregation task takes around 3 hours.
Also check 10 tips to improve The Performance of your app in AWS slideshare from AWS Summit 2014.
I don't know what else to say since I'm not an expert! Good luck!
In my case it was the amount of records. I was writing only 30 records per minute and had an Write IOPS of round about the same 20 to 30. But this was eating at the CPU, which reduced the CPU credit quite steeply. So I took all the data in that table and moved it to another "historic" table. And cleared all data in that table.
CPU dropped back down to normal measures, but Write IOPS stayed about the same, this was fine though. The problem: Indexes, I think because so many records needed to indexed when inserting it took way more CPU to do this indexing with that amount of rows. Even though the only index I had was a Primary Key.
Moral of my story, the problem is not always where you think it lies, although I had increased Write IOPS it was not the root cause of the problem, but rather the CPU that was being used to do index stuff when inserting which caused the CPU credit to fall.
Not even X-RAY on the Lambda could catch an increased query time. That is when I started to look at the DB directly.
Your Queue depth graph shows > 2 , which clearly indicate that the IOPS is under provisioned. (if Queue depth < 2 then IOPS is over provisioned)
I think you have used the default AUTOCOMMIT = 1 (autocommit mode). It performs a log flush to disk for every insert and exhausted the IOPS.
So,It is better to use (for performance tuning) AUTOCOMMIT = 0 before bulk inserts of data in MySQL, if the insert query looks like
set AUTOCOMMIT = 0;
START TRANSACTION;
-- first 10000 recs
INSERT INTO SomeTable (column1, column2) VALUES (vala1,valb1),(vala2,valb2) ... (val10000,val10000);
COMMIT;
--- next 10000 recs
START TRANSACTION;
INSERT INTO SomeTable (column1, column2) VALUES (vala10001,valb10001),(vala10001,valb10001) ... (val20000,val20000);
COMMIT;
--- next 10000 recs
.
.
.
set AUTOCOMMIT = 1
Using the above approach in t2.micro and inserted 300000 in 15 minutes using PHP.

Questions about improving the performance of Mysql concurrent insert and read data

Basic info:
My Mysql database is using TokuDB, InnoDB, MyIsam tables.
Server info:
16 core, 64GB RAM, CentOS 6.2, MySQL v 5.5
Process:
1. Import large amount data from one text file to one TokuDB table.
2. Select data by joining different table.
When process 1 and 2 running at the same time, the whole operation speed will be much slower.
Does anyone know specific reason?
Any suggestions to improve it?
Separate the IO to different disks/arrays. Having all IO on a single partition/array results in horrible performance. If possible, invest in a dedicated drive array such as IBM's DS3524 or HP Smart Array. Connecting the DB Server though Fibre Channel (or better yet SAS2) will give you an incredible performance gain. I stopped putting lots of disks into the server itself a few years ago. I get 5X the performance with MySQL on a drive array than disk in the server.
in tokudb, load data infile works much faster when importing on empty tables (especially when you have non-increment primarey key, or unique index)

What are the limitations of implementing MySQL NDB Cluster?

I want to implement NDB Cluster for MySQL Cluster 6. I want to do it for very huge data structure with minimum 2 million records.
I want to know is if there are any limitations of implementing NDB cluster. For example, RAM size, number of databases, or size of database for NDB cluster.
2 million databases? I asssume you meant "rows".
Anyway, concerning limitations: one of the most important things to keep in mind is that NDB/MySQL Cluster is not a general purpose database. Most notably, join operations, but also subqueries and range opertions (queries like: orders created between now and a week ago), can be considerably slower than what you might expect. This is in part due to the fact that the data is distributed across multiple nodes. Although some improvements have been made, Join performance can still be very disappointing.
On the other hand, if you need to deal with many (preferably small) concurrent transactions (typically single row updates/inserts/delete lookups by primary key) and you mangage to keep all of your data in memory, then it can be a very scalable and performant solution.
You should ask yourself why you want cluster. If you simply want your ordinary database that you have now, except with added 99,999% availability, then you may be disappointed. Certainly MySQL cluster can provide you with great availability and uptime, but the workload of your app may not be very well suited for the thtings cluster is good for. Plus you may be able to use another high availability solution to increase the uptime of your otherwise traditional database.
BTW - here's a list of limitations as per the doc: http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-limitations.html
But whatever you do, try out cluster, see if its good for you. MySQL cluster is not "MySQL + 5 nines". You'll find out when you try.
NDB cluster comes with two type of storage options.
1.In Memory Storage.
2.Disk storage.
NDB introduced as in memory data storage and in version 7.4(MYSQL 5.6) onwards started supporting disk storage.
current version 7.5(MySQL 5.7) supports disk storage and in this case there will be no size constraints as data is going to reside in disk and limit depend on disk storage space available with you.
Disk Storage configurations - https://dev.mysql.com/doc/refman/5.7/en/mysql-cluster-disk-data-symlinks.html
In Memory storage in NDB cluster is also quite mature and you can define memory usage in management node config.ini file.
example -
DataMemory=3072M
IndexMemory=384M
in an average table(depend on data stored in columns) total db size should be less then 1GB which can easily be configured.
Note - in my own implementation i faced one performance challenge as performance of NDB degrades with increasing number of rows in table.
Under high load concurrency read will degrade with number of increasing row.
Make sure you don't go for full table scan and provide sufficient where clause predicate.
For proper performance define secondary index properly as per your query pattern.
Defining secondary index will again increase memory consumption so plan your query pattern and memory resources accordingly.