We have nightly load jobs that writes several hundred thousand records to an Mysql reporting database running in Amazon RDS.
The load jobs are taking several hours to complete, but I am having a hard time figuring out where the bottleneck is.
The instance is currently running with General Purpose (SSD) storage. By looking at the cloudwatch metrics, it appears I am averaging less than 50 IOPS for the last week. However, Network Receive Throughput is less than 0.2 MB/sec.
Is there anyway to tell from this data if I am being bottlenecked by network latency (we are currently loading the data from a remote server...this will change eventually) or by Write IOPS?
If IOPS is the bottleneck, I can easily upgrade to Provisioned IOPS. But if network latency is the issue, I will need to redesign our load jobs to load raw data from EC2 instances instead of our remote servers, which will take some time to implement.
Any advice is appreciated.
UPDATE:
More info about my instance. I am using an m3.xlarge instance. It is provisioned for 500GB in size. The load jobs are done with the ETL tool from pentaho. They pull from multiple (remote) source databases and insert into the RDS instance using multiple threads.
You aren't using up much CPU. Your memory is very low. An instance with more memory should be a good win.
You're only doing 50-150 iops. That's low, you should get 3000 in a burst on standard SSD-level storage. However, if your database is small, it is probably hurting you (since you get 3 iops per GB- so if you are on a 50gb or smaller database, consider paying for provisioned iops).
You might also try Aurora; it speaks mysql, and supposedly has great performance.
If you can spread out your writes, the spikes will be smaller.
A very quick test is to buy provisioned IOPS, but be careful as you may get fewer than you do currently during a burst.
Another quick means to determine your bottleneck is to profile your job execution application with a profiler that understands your database driver. If you're using Java, JProfiler will show the characteristics of your job and it's use of the database.
A third is to configure your database driver to print statistics about the database workload. This might inform you that you are issuing far more queries than you would expect.
Your most likely culprit accessing the database remotely is actually round-trip latency. The impact is easy to overlook or underestimate.
If the remote database has, for example, a 75 millisecond round-trip time, you can't possibly execute more than 1000 (milliseconds/sec) / 75 (milliseconds/round trip) = 13.3 queries per second if you're using a single connection. There's no getting around the laws of physics.
The spikes suggest inefficiency in the loading process, where it gathers for a while, then loads for a while, then gathers for a while, then loads for a while.
Separate but related, if you don't have the MySQL client/server compression protocol enabled on the client side... find out how to enable it. (The server always supports compression but the client has to request it during the initial connection handshake), This won't fix the core problem but should improve the situation somewhat, since less data to physically transfer could mean less time wasted in transit.
I'm not an RDS expert and I don't know if my own particular case can shed you some light. Anyway, hope this give you any kind of insight.
I have a db.t1.micro with 200GB provisioned (that gives be 600 IOPS baseline performance), on a General Purpose SSD storage.
The heaviest workload is when I aggregate thousands of records from a pool of around 2.5 million rows from a 10 million rows table and another one of 8 million rows. I do this every day. This is what I average (it is steady performance, unlike yours where I see a pattern of spikes):
Write/ReadIOPS: +600 IOPS
NetworkTrafficReceived/Transmit throughput: < 3,000 Bytes/sec (my queries are relatively short)
Database connections: 15 (workers aggregating on parallel)
Queue depth: 7.5 counts
Read/Write Throughput: 10MB per second
The whole aggregation task takes around 3 hours.
Also check 10 tips to improve The Performance of your app in AWS slideshare from AWS Summit 2014.
I don't know what else to say since I'm not an expert! Good luck!
In my case it was the amount of records. I was writing only 30 records per minute and had an Write IOPS of round about the same 20 to 30. But this was eating at the CPU, which reduced the CPU credit quite steeply. So I took all the data in that table and moved it to another "historic" table. And cleared all data in that table.
CPU dropped back down to normal measures, but Write IOPS stayed about the same, this was fine though. The problem: Indexes, I think because so many records needed to indexed when inserting it took way more CPU to do this indexing with that amount of rows. Even though the only index I had was a Primary Key.
Moral of my story, the problem is not always where you think it lies, although I had increased Write IOPS it was not the root cause of the problem, but rather the CPU that was being used to do index stuff when inserting which caused the CPU credit to fall.
Not even X-RAY on the Lambda could catch an increased query time. That is when I started to look at the DB directly.
Your Queue depth graph shows > 2 , which clearly indicate that the IOPS is under provisioned. (if Queue depth < 2 then IOPS is over provisioned)
I think you have used the default AUTOCOMMIT = 1 (autocommit mode). It performs a log flush to disk for every insert and exhausted the IOPS.
So,It is better to use (for performance tuning) AUTOCOMMIT = 0 before bulk inserts of data in MySQL, if the insert query looks like
set AUTOCOMMIT = 0;
START TRANSACTION;
-- first 10000 recs
INSERT INTO SomeTable (column1, column2) VALUES (vala1,valb1),(vala2,valb2) ... (val10000,val10000);
COMMIT;
--- next 10000 recs
START TRANSACTION;
INSERT INTO SomeTable (column1, column2) VALUES (vala10001,valb10001),(vala10001,valb10001) ... (val20000,val20000);
COMMIT;
--- next 10000 recs
.
.
.
set AUTOCOMMIT = 1
Using the above approach in t2.micro and inserted 300000 in 15 minutes using PHP.
Related
I'm running MariaDB 10.2.31 on Ubuntu 18.4.4 LTS.
On a regular basis I encounter the following conundrum - especially when starting out in the morning, that is when my DEV environment has been idle for the night - but also during the day from time to time.
I have a table (this applies to other tables as well) with approx. 15.000 rows and (amongst others) an index on a VARCHAR column containing on average 5 to 10 characters.
Notably, most columns including this one are GENERATED ALWAYS AS (JSON_EXTRACT(....)) STORED since 99% of my data comes from a REST API as JSON-encoded strings (and conveniently I simply store those in one column and extract everything else).
When running a query on that column WHERE colname LIKE 'text%' I find query-result durations of i.e. 0.006 seconds. Nice. When I have my query EXPLAINed, I can see that the index is being used.
However, as I have mentioned, when I start out in the morning, this takes way longer (14 seconds this morning). I know about the query cache and I tried this with query cache turned off (both via SET GLOBAL query_cache_type=OFF and RESET QUERY CACHE). In this case I get consistent times of approx. 0.3 seconds - as expected.
So, what would you recommend I should look into? Is my DB sleeping? Is there such a thing?
There are two things that could be going on:
1) Cold caches (overnight backup, mysqld restart, or large processing job results in this particular index and table data being evicted from memory).
2) Statistics on the table go stale and the query planner gets confused until you run some queries against the table and the statistics get refreshed. You can force an update using ANALYZE TABLE table_name.
3) Query planner heisenbug. Very common in MySQL 5.7 and later, never seen it before on MariaDB so this is rather unlikely.
You can get to the bottom of this by enablign the following in the config:
log_output='FILE'
log_slow_queries=1
log_slow_verbosity='query_plan,explain'
long_query_time=1
Then review what is in the slow log just after you see a slow occurrence. If the logged explain plan looks the same for both slow and fast cases, you have a cold caches issue. If they are different, you have a table stats issue and you need to cron ANALYZE TABLE at the end of the over night task that reads/writes a lot to that table. If that doesn't help, as a last resort, hard code an index hint into your query with FORCE INDEX (index_name).
Enable your slow query log with log_slow_verbosity=query_plan,explain and the long_query_time sufficient to catch the results. See if occasionally its using a different (or no) index.
Before you start your next day, look at SHOW GLOBAL STATUS LIKE "innodb_buffer_pool%" and after your query look at the values again. See how many buffer pool reads vs read requests are in this status output to see if all are coming off disk.
As #Solarflare mentioned, backups and nightly activity might be purging the innodb buffer pool of cached data and reverting bad to disk to make it slow again. As part of your nightly activites you could set innodb_buffer_pool_dump_now=1 to save the pages being hot before scripted activity and innodb_buffer_pool_load_now=1 to restore it.
Shout-out and Thank you to everyone giving valuable insight!
From all the tips you guys gave I think I am starting to understand the problem better and beginning to narrow it down:
First thing I found was my default innodb_buffer_pool_size of 134 MB. With the sort and amount of data I'm processing this is ridiculously low - so I was able to increase it.
Very helpful post: https://dba.stackexchange.com/a/27341
And from the docs: https://dev.mysql.com/doc/refman/8.0/en/innodb-buffer-pool-resize.html
Now that I have increased it to close to 2GB and am able to monitor its usage and RAM usage in general (cli: cat /proc/meminfo) I realize that my 4GB RAM is in fact on the low side of things. I am nowhere near seeing any unused overhead (buffer usage still at 99% and free RAM around 100MB).
I will start to optimize RAM usage of my daemon next and see where this leads - but this will not free enough RAM altogether.
#danblack mentioned innodb_buffer_pool_dump_now and innodb_buffer_pool_load_now. This is an interesting approach to maybe use whenever the daemon accesses the DB as I would love to separate my daemon's buffer usage from the front end's (apparently this is not possible!). I will look into this further but as my daemon is running all the time (not only at night) this might not be feasible.
#Gordan Bobic mentioned "refreshing" DBtables by using ANALYZE TABLE tableName. I found this to be quite fast and incorporated it into the daemon after each time it does an extensive read/write. This increases daemon run times by a few seconds but this is no issue at all. And I figure I can't go wrong with it :)
So, in the end I believe my issue to be a combination of things: Too small buffer size, too small RAM, too many read/write operations for that environment (evicting buffered indexes etc.).
Also I will have to learn more about memory allocation etc and optimize this better (large-pages=1 etc).
I run a website with ~500 real time visitors, ~50k daily visitors and ~1,3million total users. I host my server on AWS, where I use several instances of different kind. When I started the website the different instances cost rougly the same. When the website started to gain users the RDS instance (MySQL DB) CPU constantly keept hitting the roof, I had to upgrade it several times, now it have started to take up the main part of the performance and monthly cost (around 95% of (2,8k$/month)). I currently use a database server with 16vCPU and 64GiB of RAM, I also use Multi-AZ Deployment to protect against failures. I wonder if it is normal for the database to be that expensive, or if I have done something terribly wrong?
Database Info
At the moment my database have 40 tables with the most of them have 100k rows, some have ~2millions and 1 have 30 millions.
I have a system the archives rows that are older then 21 days when they are not needed anymore.
Website Info
The website mainly use PHP, but also some NodeJS and python.
Most of the functions of the website works like this:
Start transaction
Insert row
Get last inserted id (lastrowid)
Do some calculations
Updated the inserted row
Update the user
Commit transaction
I also run around 100bots wich polls from the database with 10-30sec interval, they also inserts/updates the database sometimes.
Extra
I have done several things to try to lower the load on the database. Such as enable database cache, use a redis cache for some queries, tried to remove very slow queries, tried to upgrade the storage type to "Provisioned IOPS SSD". But nothing seems to help.
This is the changes I have done to the setting paramters:
I have though about creating a MySQL cluster of several smaller instances, but I don't know if this would help, and I also don't know if this works good with transactions.
If you need any more information, please ask, any help on this issue is greatly appriciated!
In my experience, as soon as you ask the question "how can I scale up performance?" you know you have outgrown RDS (edit: I admit my experience that leads me to this opinion may be outdated).
It sounds like your query load is pretty write-heavy. Lots of inserts and updates. You should increase the innodb_log_file_size if you can on your version of RDS. Otherwise you may have to abandon RDS and move to an EC2 instance where you can tune MySQL more easily.
I would also disable the MySQL query cache. On every insert/update, MySQL has to scan the query cache to see if there any results cached that need to be purged. This is a waste of time if you have a write-heavy workload. Increasing your query cache to 2.56GB makes it even worse! Set the cache size to 0 and the cache type to 0.
I have no idea what queries you run, or how well you have optimized them. MySQL's optimizer is limited, so it's frequently the case that you can get huge benefits from redesigning SQL queries. That is, changing the query syntax, as well as adding the right indexes.
You should do a query audit to find out which queries are accounting for your high load. A great free tool to do this is https://www.percona.com/doc/percona-toolkit/2.2/pt-query-digest.html, which can give you a report based on your slow query log. Download the RDS slow query log with the http://docs.aws.amazon.com/cli/latest/reference/rds/download-db-log-file-portion.html CLI command.
Set your long_query_time=0, let it run for a while to collect information, then change long_query_time back to the value you normally use. It's important to collect all queries in this log, because you might find that 75% of your load is from queries under 2 seconds, but they are run so frequently that it's a burden on the server.
After you know which queries are accounting for the load, you can make some informed strategy about how to address them:
Query optimization or redesign
More caching in the application
Scale out to more instances
I think the answer is "you're doing something wrong". It is very unlikely you have reached an RDS limitation, although you may be hitting limits on some parts of it.
Start by enabling detailed monitoring. This will give you some OS-level information which should help determine what your limiting factor really is. Look at your slow query logs and database stats - you may have some queries that are causing problems.
Once you understand the problem - which could be bad queries, I/O limits, or something else - then you can address them. RDS allows you to create multiple read replicas, so you can move some of your read load to slaves.
You could also move to Aurora, which should give you better I/O performance. Or use PIOPS (or allocate more disk, which should increase performance). You are using SSD storage, right?
One other suggestion - if your calculations (step 4 above) takes a significant amount of time, you might want look at breaking it into two or more transactions.
A query_cache_size of more than 50M is bad news. You are writing often -- many times per second per table? That means the QC needs to be scanned many times/second to purge the entries for the table that changed. This is a big load on the system when the QC is 2.5GB!
query_cache_type should be DEMAND if you can justify it being on at all. And in that case, pepper the SELECTs with SQL_CACHE and SQL_NO_CACHE.
Since you have the slowlog turned on, look at the output with pt-query-digest. What are the first couple of queries?
Since your typical operation involves writing, I don't see an advantage of using readonly Slaves.
Are the bots running at random times? Or do they all start at the same time? (The latter could cause terrible spikes in CPU, etc.)
How are you "archiving" "old" records? It might be best to use PARTITIONing and "transportable tablespaces". Use PARTITION BY RANGE and 21 partitions (plus a couple of extras).
Your typical transaction seems to work with one row. Can it be modified to work with 10 or 100 all at once? (More than 100 is probably not cost-effective.) SQL is much more efficient in doing lots of rows at once versus lots of queries of one row each. Show us the SQL; we can dig into the details.
It seems strange to insert a new row, then update it, all in one transaction. Can't you completely compute it before doing the insert? Hanging onto the inserted_id for so long probably interferes with others doing the same thing. What is the value of innodb_autoinc_lock_mode?
Do the "users" interactive with each other? If so, in what way?
Lately we are seeing some queries in mysql(master) logs but no idea why they are shown there:
Queries are select/update table where id = <some integer>.
There is index on id
table size is below 100 000
Rows scanned are in hundreds (sometimes < 100)
Server is running on extremely good hardware
there are no joins involved
We do not see any heavy activity running on database at that time
tables are innodb
the same queries generally don’t even take 50ms, but sometimes all the execution of these queries take about 4-8 seconds
One observation was all the similar "non-slow-but-weirdly-taking-high-time" queries take almost the same amount of time for some duration . I.e. queries like stated in the top will all take about 4.35 seconds with variation of 0.05 seconds.
Does the network latency/packets-drop affect mysql query timing?
show processlist;
show global status like '%onnect%';
show global status like '%open%';
Is anything backed up? Is it waiting in a queue? waiting for file handles? What are your max_connections, open-files-limit, thread_concurrency ?
One side question: does the network latency/packets-drop affect mysql query timing?
Yes, the timeout must occur before the query is resent by the client
Do you see these problems locally or over the network? If the latter, then obviously packet drops can affect your performance if you are measuring from the client.
Is it running in a virtual machine that can affect the performance?
Disk problems?
How is serialization set up? Can it be a contention problem by many processes accessing the same row?
You may want to enable the query/slow query logs to see if there is any sort of pattern that causes this.
Mysql slow log is not representative source to learn about your slow queries. If something makes server work slow all queries usually go to slow log.
E.g. if you have some slow blocking select on MyISAM a lot of PK updates will go to slow log.
You need to search for other slow queries or server problems. What about load average on this particular machine? Isn't mysql displaced into swap memory? Other applications? Queries per second?
I have a denormalized table product with about 6 million rows (~ 2GB) mainly for lookups. Fields include price, color, unitprice, weight, ...
I have BTREE indexes on color etc. Query conditions are dynamically generated from the Web, such as
select count(*)
from product
where color = 1 and price > 5 and price < 100 and weight > 30 ... etc
and
select *
from product
where color = 2 and price > 35 and unitprice < 110
order by weight
limit 25;
I used to use InnoDB and tried MEMORY tables, and switched to NDB hoping more concurrent queries can be done faster. I have 2 tables with the same schema, indexes, and data. One is InnoDB while the other is NDB. But the results are very disappointing:for the queries mentioned above, InnoDB is like 50 times faster than NDB. It's like 0.8 seocond vs 40 seconds. For this test I was running only a single select query repeatedbly. Both InnoDB and NDB queries are using the same index on color.
I am using mysql-5.1.47 ndb-7.1.5 on a dual Xeon 5506 (8 cores total), 32GB memory running CentOS 5. I set up 2 NDB Data nodes, one MGM node and one MYSQL node on the same box. For each node I allocated like 9GB memory, and also tried MaxNoOfExecutionThreads=8, LockPagesInMainMemory, LockExecuteThreadToCPU and many other config parameters, but no luck. While NDB is running the query, my peak CPU load was only like 200%, i.e., only 2 out of 8 cores were busy. Most of the time it was like 100%. I was using ndbmtd, and verified in the data node log and the LQH threads were indeed spawned.
I also tried explain, profiling -- it just showing that Sending data was consuming most of the time. I also went thru some Mysql Cluster tuning documents available online, not very helpful in my case.
Anybody can shed some light on this? Is there any better way to tune an NDB database? Appreciate it!
You need to pick the right storage engine for your application.
myISAM -- read frequently / write infrequently. Ideal for data lookups in big tables. Does reasonably well with complex indexes and is quite good for batch reloads.
MEMORY -- good for fast access to relatively small and simple tables.
InnoDB -- good for transaction processing. Also good for a mixed read / write workload.
NDB -- relatively less mature. Good for fault tolerance.
The mySQL server is not inherently multiprocessor software. So adding cores isn't necessarily going to jack up performance. A good host for mySQL is a decent two-core system with plenty of RAM and the fastest disk IO channels and disks you can afford. Do NOT put your mySQL data files on a networked or shared file system, unless you don't care about query performance.
If you're running on Linux issue these two commands (on the machine running the mySQL server) to see whether you're burning all your cpu, or burning all your disk IO:
sar -u 1 10
sar -d 1 10
Your application sounds like a candidate for myISAM. It sounds like you have plenty of hardware. In that case you can build a master server and an automatically replicated slave server But you may be fine with just one server. This will be easier to maintain.
Edit It's eight years latar and this answer is now basically obsolete.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
We deploy an (AJAX - based) Instant messenger which is serviced by a Comet server. We have a requirement to store the sent messages in a DB for long-term archival purposes in order to meet legal retention requirements.
Which DB engine provides the best performance in this write-once, read never (with rare exceptions) requirement?
We need at least 5000 Insert/Sec. I am assuming neither MySQL nor PostgreSQL
can meet these requirements.
Any proposals for a higher performance solution? HamsterDB, SQLite, MongoDB ...?
Please ignore the above Benchmark we had a bug inside.
We have Insert 1M records with following columns: id (int), status (int), message (140 char, random).
All tests was done with C++ Driver on a Desktop PC i5 with 500 GB Sata Disk.
Benchmark with MongoDB:
1M Records Insert without Index
time: 23s, insert/s: 43478
1M Records Insert with Index on Id
time: 50s, insert/s: 20000
next we add 1M records to the same table with Index and 1M records
time: 78s, insert/s: 12820
that all result in near of 4gb files on fs.
Benchmark with MySQL:
1M Records Insert without Index
time: 49s, insert/s: 20408
1M Records Insert with Index
time: 56s, insert/s: 17857
next we add 1M records to the same table with Index and 1M records
time: 56s, insert/s: 17857
exactly same performance, no loss on mysql on growth
We see Mongo has eat around 384 MB Ram during this test and load 3 cores of the cpu, MySQL was happy with 14 MB and load only 1 core.
Edorian was on the right way with his proposal, I will do some more Benchmark and I'm sure we can reach on a 2x Quad Core Server 50K Inserts/sec.
I think MySQL will be the right way to go.
If you are never going to query the data, then i wouldn't store it to a database at all, you will never beat the performance of just writing them to a flat file.
What you might want to consider is the scaling issues, what happens when it's to slow to write the data to a flat file, will you invest in faster disk's, or something else.
Another thing to consider is how to scale the service so that you can add more servers without having to coordinate the logs of each server and consolidate them manually.
edit: You wrote that you want to have it in a database, and then i would also consider security issues with havening the data on line, what happens when your service gets compromised, do you want your attackers to be able to alter the history of what have been said?
It might be smarter to store it temporary to a file, and then dump it to an off-site place that's not accessible if your Internet fronts gets hacked.
If you don't need to do queries, then database is not what you need. Use a log file.
it's only stored for legal reasons.
And what about the detailed requirements? You mention the NoSQL solutions, but these can't promise the data is realy stored on disk. In PostgreSQL everything is transaction safe, so you're 100% sure the data is on disk and is available. (just don't turn of fsync)
Speed has a lot to do with your hardware, your configuration and your application. PostgreSQL can insert thousands of record per second on good hardware and using a correct configuration, it can be painfully slow using the same hardware but using a plain stupid configuration and/or the wrong approach in your application. A single INSERT is slow, many INSERT's in a single transaction are much faster, prepared statements even faster and COPY does magic when you need speed. It's up to you.
I don't know why you would rule out MySQL. It could handle high inserts per second. If you really want high inserts, use the BLACK HOLE table type with replication. It's essentially writing to a log file that eventually gets replicated to a regular database table. You could even query the slave without affecting insert speeds.
Firebird can easily handle 5000 Insert/sec if table doesn't have indices.
Depending in your system setup MySql can easily handle over 50.000 inserts per sec.
For tests on a current system i am working on we got to over 200k inserts per sec. with 100 concurrent connections on 10 tables (just some values).
Not saying that this is the best choice since other systems like couch could make replication/backups/scaling easier but dismissing mysql solely on the fact that it can't handle so minor amounts of data it a little to harsh.
I guess there are better solutions (read: cheaper, easier to administer) solutions out there.
Use Event Store (https://eventstore.org), you can read (https://eventstore.org/docs/getting-started/which-api-sdk/index.html) that when using TCP client you can achieve 15000-20000 writes per second. If you will ever need to do anything with data, you can use projections or do the transformations based on streams to populate any other datastore you wish.
You can create even cluster.
If money plays no role, you can use TimesTen.
http://www.oracle.com/timesten/index.html
A complete in memory database, with amazing speed.
I would use the log file for this, but if you must use a database, I highly recommend Firebird. I just tested the speed, it inserts about 10k records per second on quite average hardware (3 years old desktop computer). The table has one compound index, so I guess it would work even faster without it:
milanb#kiklop:~$ fbexport -i -d test -f test.fbx -v table1 -p **
Connecting to: 'LOCALHOST'...Connected.
Creating and starting transaction...Done.
Create statement...Done.
Doing verbatim import of table: TABLE1
Importing data...
SQL: INSERT INTO TABLE1 (AKCIJA,DATUM,KORISNIK,PK,TABELA) VALUES (?,?,?,?,?)
Prepare statement...Done.
Checkpoint at: 1000 lines.
Checkpoint at: 2000 lines.
Checkpoint at: 3000 lines.
...etc.
Checkpoint at: 20000 lines.
Checkpoint at: 21000 lines.
Checkpoint at: 22000 lines.
Start : Thu Aug 19 10:43:12 2010
End : Thu Aug 19 10:43:14 2010
Elapsed : 2 seconds.
22264 rows imported from test.fbx.
Firebird is open source, and completely free even for commercial projects.
I believe the answer will as well depend on hard disk type (SSD or not) and also the size of the data you insert. I was inserting a single field data into MongoDB on a dual core Ubuntu machine and was hitting over 100 records per second. I introduced some quite large data to a field and it dropped down to about 9ps and the CPU running at about 175%! The box doesn't have SSD and so I wonder if I'd have gotten better with that.
I also ran MySQL and it was taking 50 seconds just to insert 50 records on a table with 20m records (with about 4 decent indexes too) so as well with MySQL it will depend on how many indexes you have in place.