I have a table in MySql at the moment, 7.3 million rows, 1.5GB in size if I run this query:
How to get the sizes of the tables of a mysql database?
I'm trying to get a handle on what a full table scan of that in AWS Aurora would cost me?
AWS lists it as:
I/O Rate - $0.200 per 1 million requests
But how do I possible translate that into "what will this cost me"?
See also https://stackoverflow.com/a/6927400/122441
As an example, a medium sized website database might be 100 GB in size and expect to average 100 I/Os per second over the course of a month. This would translate to $10 per month in storage costs (100 GB x $0.10/month), and approximately $26 per month in request costs (~2.6 million seconds/month x 100 I/O per second * $0.10 per million I/O).
Related
I need to compare few scenarios, which can be fulfilled either by calling:
an additional timestamp column from mysql database or
Looping over the resultant array.
Elaborating further:
CASE 1: 144 byte columns + 4 byte timestamp column for 10K rows, then looping on array of size 50.(download size- 1480000 Bytes)
CASE 2: 144 byte columns for 10K rows, looping on array of size 10000. (download size- 1440000 bytes)
Data download roughly 40KB more for Case1 while 10000 more loop iteration for case2.
Which of the 2 scenarios could be faster, downloading 40KB more or 10000 loop iterations?
Your first scenario is by far the best. Here is why.
SQL is designed to extract subsets of rows from tables. It's designed to allow your tables to be many orders of magnitude bigger than your RAM. If you use a query like
SELECT *
FROM mytable
WHERE mytimestamp >= TODAY() - INTERVAL 1 DAY
AND mytimestamp < TODAY()
you will get all the rows with timestamps anytime yesterday, for example. If you put an index on the mytimestamp column, SQL will satisfy this query very quickly and only give the rows you need. And it will satisfy any queries looking for a range of timestamps similarly quickly.
There are no answers that are true in 100% of situations. For example, I would prefer to do the first query when I use a fast enough network (anything 1Gbps or higher) and data transfer is free. The difference in the example you show is 40,000 bytes, but it's only 2.7% of the total.
On the other hand, if you need to optimize to minimize bandwidth usage, that's a different scenario. Like if you are transmitting data to a satellite over a very slow link. Or you have a lot of competing network traffic (enough to use up all the bandwidth), and saving 2.7% is significant. Or if you are using a cloud vendor that charges for total bytes transferred on the network.
You aren't taking into account the overhead of executing 1000 separate queries. That means 1000x the bytes sent to the database server, as you send queries. That takes some network bandwidth too. Also the database server needs to parse and optimize each query (MySQL does not cache optimization plans as some other RDBMS products do). And then begin executing the query, starting with an index lookup without the context of the previous query result.
"Performance" is a slippery term. You seem to be concerned only with bandwidth use, but there are other ways of measuring performance.
Throughput (bytes per second)
Latency (seconds per response)
Wait time (seconds until the request begins)
All of these can be affected by the total load, i.e. number of concurrent requests. A busy system may have traffic congestion or request queueing.
You can see that this isn't a simple problem.
Is GCP persistent disk pricing prorated for short term use like retrieving data from a backup snapshot?
Pricing for GCP balanced storage is $0.1 per GB per month.
If I create a 100GB disk for 1 day only in a 30 days month. Will I be billed for:
(1) 0.1 * 100 = $10?
Or
(2) 0.1 * 100 / 30 days = $0.33?
With Amazon EBS it seems to be (2) according to this page. But according to GCE pricing, although not clear, Google seems more like (1).
Yes, the disk pricing is always prorated. However, you can use the pricing calculator to get a good estimate of any product pricing.
Considering moving my MySQL architecture to AWS DynamoDB. My application has a requirement of 1,000 r/w requests per second. How does this play with PHP/updates? Having 1,000 workers process DynamoDB r/w's seems like it will have a higher toll on CPU/Memory than MySQL r/w's.
I have thought about a log file to store the updated information, then create scripts to process the log files to remove db load - however stunted by file locking, would be curious if anyone had any ideas on implementing this - 300 separate script's would be writing to a single log file. The log file could then be processed every minute to the db. Not sure how this could be implemented without locking. Server script is written in PHP.
Current Setup
MYSQL Database (RDS on AWS)
table A has 5m records- the main db table, 30 columns mostly numerical + text <500 chars. (Growth +30k records per day). Has relationships with 4 other tables containing;
table b - 15m records (Growth +90k records per day).
table c - 2m records (Growth +10k records per day).
table d - 4m records (Growth +15k records per day).
table c - 1m records (Growth +5k records per day).
Table A updates around 1,000 records per second then updated / added rows are queued for adding to SOLR search.
Would appreciate some much needed advice to lower costs. Are there hidden costs or other solutions I should be aware of before starting development?
I afraid the scope for performance improvement for your DB just too broad.
IOPS. Some devops choose provision 200GB storage (200 x 3 = 600 IOPS)
than the "provisioned IOPS" for smaller storage (say they only need
50GB then purchase provisioned IOS). You need to launch an excel
sheet to find the pricing/performance sweet spot.
You might need to create another "denormalised table" from table A,
if frequent select from table A but not traverse the whole
text <500 chars. Don't underestimated the text workload.
Index, index , index.
if you deal with tons of non-linear search, perhaps copy part of relevant data to dynamodb that you think will improve the performance, test it first, but maintain the RDBMS structure.
there is no one size fit all solutions. Please also inspect usage of Message queue if required.
Adding 200k records/days actually not much for today RDBMS. Even 1000 IOPS are only happen in burst. If query is the heaviest part, then you need to optimize that part.
I am analizing how to store over 10000 signals 50 times per second. Probably I will read them from memory. Each signal has a timestamp (8 bytes) and a double (8 bytes). This process will be running 4 hours 1 day a week. Then:
10000 x 50 x 16 = 8 MBS / seconds.
8 x 3600 x 4 = 115 GBS / week.
What database (or other option like files) should I use to store this data quickly. Are MondoDB or Cassandra good options? What language would be good? Is Java enough fast to read data from memory and store in the database or C is a better choice?
Is needed a cluster solution?
Thanks.
Based on your description, I'd suggest Sqlite database. It's very light weight and faster than MySQL and MongoDb.
See benchmark here.
It is roughly 700~800 MB of data per single day - so if you need to query it - after one month- 25 GB will be scanned.
In this case you probably will need a clustered/sharded solution to split the load.
As data will grove constantly - you need to have a dynamic solution which can use mongoDB shards and replica sets to span load and manage data distribution.
I read somewhere that there is a guiding principle to limit the number of rows in tables to below 1 million. I was wondering if this was true. For a project I have I will roughly have tables with 10,000 rows, 40,000 rows, 160,000 rows, and 100,000 rows respectively. What performance could I expect on a 4 core, 8 GB machine for this? (I know some people achieved 20k requests per second)
The one million number is bogus. I've run MySQL instances with tables that have 20 million rows and over a dozen columns. Queries weren't fast, but the application was offline data processing and performance was more than adequate.
You should benchmark queries on your own system to determine its performance. I can't tell you anything about a system with "4 cores, 8 gb" beyond the fact that 8 GB is a good starting point for a big database server (you should be able to keep your indexes in memory, and smaller tables may also fit in memory). Four cores could be plenty of CPU, but it might not be. It depend entirely on what sort of cores they are.
You also shouldn't ignore disk performance, particularly if your tables won't fit in RAM. I think the machine I had 20m rows on had a RAID 1+0 array of 15k RPM disks.
But don't take my comments to mean that you need more RAM, more CPUs or faster disks. Run some benchmarks on your own system. Make sure your system has an appropriate schema for your queries. Make sure you have as few subqueries and views (results aren't indexed) as possible. Make sure your tables are properly indexed. Then look at your performance and hardware.