Move from MySQL to AWS DynamoDB? 5m rows, 4 tables, 1k writes p/s - mysql

Considering moving my MySQL architecture to AWS DynamoDB. My application has a requirement of 1,000 r/w requests per second. How does this play with PHP/updates? Having 1,000 workers process DynamoDB r/w's seems like it will have a higher toll on CPU/Memory than MySQL r/w's.
I have thought about a log file to store the updated information, then create scripts to process the log files to remove db load - however stunted by file locking, would be curious if anyone had any ideas on implementing this - 300 separate script's would be writing to a single log file. The log file could then be processed every minute to the db. Not sure how this could be implemented without locking. Server script is written in PHP.
Current Setup
MYSQL Database (RDS on AWS)
table A has 5m records- the main db table, 30 columns mostly numerical + text <500 chars. (Growth +30k records per day). Has relationships with 4 other tables containing;
table b - 15m records (Growth +90k records per day).
table c - 2m records (Growth +10k records per day).
table d - 4m records (Growth +15k records per day).
table c - 1m records (Growth +5k records per day).
Table A updates around 1,000 records per second then updated / added rows are queued for adding to SOLR search.
Would appreciate some much needed advice to lower costs. Are there hidden costs or other solutions I should be aware of before starting development?

I afraid the scope for performance improvement for your DB just too broad.
IOPS. Some devops choose provision 200GB storage (200 x 3 = 600 IOPS)
than the "provisioned IOPS" for smaller storage (say they only need
50GB then purchase provisioned IOS). You need to launch an excel
sheet to find the pricing/performance sweet spot.
You might need to create another "denormalised table" from table A,
if frequent select from table A but not traverse the whole
text <500 chars. Don't underestimated the text workload.
Index, index , index.
if you deal with tons of non-linear search, perhaps copy part of relevant data to dynamodb that you think will improve the performance, test it first, but maintain the RDBMS structure.
there is no one size fit all solutions. Please also inspect usage of Message queue if required.
Adding 200k records/days actually not much for today RDBMS. Even 1000 IOPS are only happen in burst. If query is the heaviest part, then you need to optimize that part.

Related

Best practice to archive less frequent accessed data in MySQL

I have an e-commerce application running on MySQL server from last 10 years. There are particularly two tables:
orders (Apporx 30 million rows and 3 GB data)
order_item (Approx 45 million rows and 3.5 GB data)
These tables contain all orders and their respective products purchased in last 10 years. Generally last 6 months data is most frequently accessed and all previous data is either used in multiple reports by business/marketing teams or if a customer checks his order history
As this table constantly growing in size we are running into increased query time for both read and write. Therefore, I was wondering to archive some data from these tables to improve performance. Main concern is that archived data should still be available for reads by reports and customers.
What would be a right archiving strategy for this?
PARTITIONing is unlikely to provide any performance benefit.
Please provide SHOW CREATE TABLE and some of the "slow" queries so I can analyze the access patterns, clustered index choice, the potential for Partitioning, etc.
Sometimes a change to the PK (which is "clustered" with the data) can greatly improve the "locality of reference", which provides performance benefits. A user going back in time may also blow out the cache (the "buffer pool). Changing the PK to start with user_id may significantly improve on that issue.
Another possibility is to use replication to segregate "read-only" queries that go back in time from the "active" records (read/write in the last few months). This can benefit by decreasing the I/O by having customers on the Primary while relegating the "reports" to a Replica.
How much RAM? When the data gets "big", running a "report" may blow out the cache, making everything be slower. Summary table(s) is a very good cure for such.
The optimal indexes for a Partitioned table are necessarily different than for the equivalent non-partitioned table.

Database design for heavy timed data logging - Car Tracking System

I am a making a car tracking system and i want to store data that each car sends after every 5 seconds in a MySql database. Assuming that i have 1000 cars transmitting data to my system after 5 seconds, and the data is stored in one table. At some point i would want to query this table to generate reports for specific vehicle. I am confused between logging all the vehicles data in one table or creating a table for each vehicle (1000 tables). Which is more efficient?
OK 86400 seconds per day / 5 = 17280 records per car and day.
Will result in 17,280,000 records per day. This is not an issue for MYSQL in general.
And a good designed table will be easy to query.
If you go for one table for each car - what is, when there will be 2000 cars in future.
But the question is also: how long do you like to store the data?
It is easy to calculate when your database is 200 GB, 800GB, 2TB,....
One table, not one table per car. A database with 1000 tables will be a dumpster fire when you try to back it up or maintain it.
Keep the rows of that table as short as you possibly can; it will have many records.
Index that table both on timestamp and on (car_id, timestamp) . The second index will allow you to report on individual cars efficiently.
Read https://use-the-index-luke.com/
This is the "tip of the iceberg". There are about 5 threads here and on dba.stackexchange relating to tracking cars/trucks. Here are some further tips.
Keep datatypes as small as possible. Your table(s) will become huge -- threatening to overflow the disk, and slowing down queries due to "bulky rows mean that fewer rows can be cached in RAM".
Do you keep the "same" info for a car that is sitting idle overnight? Think of how much disk space this is taking.
If you are using HDD disks, plain on 100 INSERTs/second before you need to do some redesign of the ingestion process. (1000/sec for SSDs.) There are techniques that can give you 10x, maybe 100x, but you must apply them.
Will you be having several servers collecting the data, then doing simple inserts into the database? My point is that that may be your first bottleneck.
PRIMARY KEY(car_id, ...) so that accessing data for one car is efficient.
Today, you say the data will be kept forever. But have you computed how big your disk will need to be?
One way to shrink the data drastically is to consolidate "old" data into, say, 1-minute intervals after, say, one month. Start thinking about what you want to keep. For example: min/max/avg speed, not just instantaneous speed. Have an extra record when any significant change occurs (engine on; engine off; airbag deployed; etc)
(I probably have more tips.)

Split large MySQL table into multiple files?

I am managing a MySQL server with several large tables (> 500 GB x 4 tables). Each of these tables have similar properties:
All tables have a timestamp column which is part of the primary key
They are never deleted from
They are updated only for a short period of time after being inserted
Most of the reads occur for rows inserted within the past year
Rarely but occasionally reads are done for longer back
We have several years worth of history (rows) and the rate at which we are accumulating rows is accelerating. Right now these tables are all stored on disk with one-file-per-disk. We are starting to run out of disk on the host machine and while we can buy more disk space I'd prefer to have a scalable strategy for dealing with ever increasing data.
Since the older records are never written to and rarely updated, I was hoping there would be a way to spread each of these large tables across multiple files using the timestamp column as a way of segregating the records. Any files corresponding to records more than a year old, I'd like to keep in a networked file share. This way they'll remain available to the server and we can limit the amount of disk needed locally for the data sub-1 year old.
I have some further hopes for good things (tm) from such splitting of tables across multiple files, but let's leave it there.
The questions then are
Is this a good idea?
How do I do this within MySQL? I'm open to alternative storage technologies.
Edit: If this approach isn't a good idea, what are some alternatives that meet the dual goals of avoiding single-host storage limitation and keeping the full data set accessible to DB clients?

How to increase database performance if there is 0.1M traffic

I am developing a site and I'm concerned about the performance.
In the current system there are transactions like adding 10,000 rows to a single table. It doesn't matter it took around 0.6 seconds to insert.
But I am worrying about what happens if there are 100,000 concurrent users and 1000 of the users want to add 10,000 rows to a single table at once.
How could this impact the performance compared to a single user? How can I improve these transactions if there is a large amount of traffic like in this situation?
When write speed is mandatory, the way we tackle it is getting quicker hard drives.
You mentioned transactions, that means you need your data durable (D of ACID). This requirement rules out MyISAM storage engine or any type of NoSQL so I'll focus the answer towards what goes on with relational databases.
The way it works is this: you get a set number of Input Output Operations per Second or IOPS per hard drive. Hard drives also have a metric called bandwith. The metric you are interested in is write speed.
Some crude calculation here would be this - Number of MB per second divided by number of IOPS = how much data you can squeeze per IOPS.
For mechanical drives, this magic IOPS number is anywhere between 150 and 300 - quite low. Given their bandwith of about 100 MB/sec, you get a real small number of writes and bandwith per write. This is where Solid State Drives kick in - their IOPS number starts at about 5 000 (some even go to 80 000) which is awesome for databases.
Connecting these drives in RAID gives you a super quick storage solution. If you are able to squeeze 10 000 inserts into one transaction, the disk will try to squeeze all 10k inserts through 1 IOPS.
Another strategy is partitioning your table and having multiple drives where MySQL stores the data.
This is as far as you can go with a single MySQL installation. There are strategies for distributing data to multiple MySQL nodes etc. but I assume that's out of scope of your question.
TL;DR: you need quicker disks.
If you are trying to scale for inserting millions of rows per second, you have bigger problems. That could add up to trillions of rows per month. That's hundreds of terabytes before the end of the month. Do you have a big enough disk farm for that? Can you afford enough SSDs for that.
Another thing. With a trillion rows, it is quite challenging to have any indexes other than a simple auto_increment. Without any indexes, how do you plan on accessing the data? A table scan of a trillion rows will take day(s).
Also, you said 100,000 users; you implied that they are connected simultaneously? That, too, is a challenge.
What are the users doing to generate 10K rows all at once? What about the network bandwidth?
Etc. Etc.
If you really have a task like this, Sharding is probably the only solution. And that is in addition to SSDs, RAID, IOPs, etc, etc.
Few stuff that you must consider both from software and hardware point.
Things must consider :
Go for SSD drive to have better IO.
Good to have 10GB of network, if you have that huge traffic.
Use mysql 5.6 or above, they made good improvement on performance over previous version.
Use bulk inserts, instead of sequential one, and even better if you can store all data in a file and use load_data_infile. This would be
20 times faster then regular insert.
Mysql provide multiple ways to scaleout. Its depend upon on your product requirement which way you want to go.

MySQL Table Locks

I was asked to do some PHP scripts on MySQL DB to show some data when I noticed the strange design they had.
They want to perform a study that would require collecting up to 2000 records per user and they are automatically creating a new table for each user that registers. It's a pilot study at this stage so they have around 30 tables but they should have 3000 users for the real study.
I wanted to suggest gathering all of them in a single table but since there might be around 1500 INSERTs per minute to that database during the study period, I wanted to ask this question here first. Will that cause table locks in MySQL?
So, Is it one table with 1500 INSERTs per minute and a maximum size of 6,000,000 records or 3000 tables with 30 INSERTs per minute and a maximum size of 2000 records. I would like to suggest the first option but I want to be sure that it will not cause any issues.
I read that InnoDB has row-level locks. So, will that have a better performance combined with the one table option?
This is a huge loaded question. In my experience performance is not really measured accurately by table size alone. It comes down to design. Do you have the primary keys and indexes in place? Is it over indexed? That being said, I have also found that almost always one trip to the DB is faster than dozens. How big is the single table (columns)? What kind of data are you saving (larger than 4000K?). It might be that you need to create some prototypes to see what performs best for you. The most I can recommend is that you carefully judge the size of the data you are collecting and allocate accordingly, create indexes (but not too many, don't over index), and test.