Suituation:
Client is running a web based finance application, where the primary functionaities includes huge volume of financial transactions both in and out.
The processes are automated.
We run several cron job tasks at midnight to split the payments for appropriate customers.
Monthly on average we have 2000 to 3000 new customers with total of 30,000 customers currently.
Our transactional tables has almost 900000 records so far and expect drastic increase in comming months.
Technologies: Initially we used LAMP environment, With Codeignitor framework, Laravel elequont ORM for querying and Mysql.
Hosting: Hosted in AWS, T2 small instance, no load balancer implemented.
**This application was developed three years back.
Problem:
Currenty our client faces downtime during peak hours and also their customers faces load time issues while reviewing their transaction archives and stats.
And also they fear in case if the cron job tasks fails, they could not able to handle the suituation. (vast calculations are made and amounts were inserted accross huge volume of customers).
Our plan:
So right now, we planned to rework on the application from scratch with performance and fault tolerance as our primary goal. And this application has to be reliable at least for another
six to eight years.
Technologies: Node (Sails.js), Angular 5, AWS with load balancer, AWS RDS (Mysql)
Our approach: From our analysis, we gained few straight forward reasons for the performance loss. Primarly, there are many stats for customers which access heavy tables.
Most of the stats are on current month. So we plan to add log tables for such and keep only the current month data in the specific table.addMethod
So, there are going to be may such log table which will only going to have read operation.
Queries:
Is it good to split the ready only tables to separate database or can we have it within the single database.
How Mysql buffer cache differ from Redis / memcache, Is there any memory consumption problem occurs while more traffic flows in?
What is the best approach to truncate few tables at the end of evey month (As i mentioned about log file)?
Am I proceeding in right direction?
A million rows is a modest size, not "huge". Since you are having performance problems, I have to believe that it stems from poor indexing and/or poor query formulation.
Find out what queries are having the most trouble. See this for suggestions on using mysqldumpslow -s t or pt-query-digest to locate them.
Provide SHOW CREATE TABLE and EXPLAIN SELECT ... for discussion of how to improve them. It may be as simple as adding a "composite" index.
Another possible performance bottleneck may be repeatedly summarizing old data. If this is the case, then consider the Data Warehousing technique of _building and maintaining Summary Tables .
As for your 4 questions, I tentatively say "no" to each.
The various frameworks tend to make small applications easy to develop, but they start to give trouble when you scale. Still, there are things that can be fixed without abandoning (yet) the frameworks.
AWS, etc, give you lots of reliability and read scaling. But, I repeat, the likely place to look is at the slow queries, not the various ideas you presented.
As for periodic truncation, let's discuss that after seeing what the data looks like and what the business requirements are for data retention.
Related
I'm studying up on the future of the database I maintain. Right now we have one database server running MySQL using InnoDB and MyISAM tables. I'm watching the metrics closely and I can see that this will not be sustainable forever. Where does one go next? I have reviewed solutions like Cassandra, but I want to stick to an SQL approach so I'm not sure about that. I have also reviewed NDB cluster and federated database solutions, but I've noticed no one has anything good to say about those. Basically, I looking for advice on intermediate solutions. We do not yet need a vast multi-node array operating on tens of DB servers, but one server is about to reach its limit. I don't want to just throw another server on the pile without making sure that the DB architecture at hand benefits well from the extra power. What do you guys suggest for when it is time to move beyond a single server and how to manage this transition. Thank you to anyone who can help.
Edit to better explain: At present, we have about a hundred tables. We run many join operations to gather the data the end user needs to see, such that most of our queries join at least two tables to complete any operation. The data set is not too big yet, only a few hundred Megs, but the data is accessed in such a way that each table has a few writes everyday, the heaviest of which has about a thousand writes a day. We probably have about a few hundred thousand reads a day too, so read do outnumber writes about 9 to 1.
First Solutions:
Indices go a LONG way
Use profiling software to find your slow queries and optimize them
Depending on your hosting company you can usually update the RAM/CPU of the server
Second Solutions:
Split your reads and your writes into two databases. (I don't know if you're using PHP or not but PHP has a plugin that will automatically split them for you without having to change any of your code http://php.net/manual/en/mysqlnd-ms.rwsplit.php)
Use software like memcache to store database information that is frequently queried but not frequently updated
For a customer I am currently investigating improvements to their database structure.
My customer offers holiday rentals on their website.
On their front page they have a search function wich sends a query to a MySQL database architecture (Master-Master setup) that answers that query with all the holiday rentals that the customer is interested in.
Due to the growth of the company and the increasing load on their servers the search query's are currently running up to 10+ seconds. Mainly because the query's end with an ORDER BY which causes MySQL to create a temp table and sort all the data, an average search query can return up to 20k holiday homes.
Ofcourse one of the things we are doing is investigating the query's, rewriting them and putting indexes where needed. Unfortunately we are unable to get allot more performance under these circumstances.
That's why we are looking into implementing Memcached on top of MySQL to cache these large datasets in memory for faster retrieval. Unfortunately the datasets that the query's return are quite large wich makes Memcached not that effective at this point. The array that MySQL returns are currently about 15k rows with about 60 values per row.
The reason Memcached is interesting is because we want to drastically improve the search function, and lowering the load on the MySQL platform. This would make it more scalable.
I am wondering if there is anyone that is familair with (longterm) caching MySQL data in Memcached and making it more effective for large datasets?
Thanks a bunch!
Memcache is for storing key-value pairs, not for large sets of data. Will it work? Yes. Of course it will. But with how much data you guys are going to throw at it, you're going to run out of memory very soon and end up hitting the database anyway with how often your search results may change. And remember that just because it's memcache doesn't mean it doesn't have to go through web sockets to a (most likely) different machine. Your problem seems to be that you're using MySQL for something it was never designed well for, which is its use as a search engine. No matter how many things you optimize, all you're doing is raising the ceiling an inch at a time.
I could take this post in a "you need to optimize MySQL parameters so that it doesn't have to create those temp tables" direction, but I'm going to assume you've already looked into that and keep going.
My recommendation is that you implement something on top of MySQL to handle the searching. In my own quest for fast searching, these are the solutions I gave the most weight to:
Sphinx: http://sphinxsearch.com
Solr: http://lucene.apache.org/solr
Elasticsearch: http://www.elasticsearch.org
You'll find plenty of resources here on StackOverflow for which of those is better and faster and what not. For our purposes, we picked Elasticsearch for one of our projects and Solr for another.
The mysql performance of running the magento for this situation under one mysql installation is giving a headache. I wonder if it is feasible to setup an individual mysql for each website so that updates to the catalog can occur concurrently across all websites.
It sure can be made working within a cluster and if you queue your updates and plan ahead for such. But it won't be cheap and i'll guess you 'll need a mysql instance for every 30 to 50 website. It's worth to observe mysql sharding for heavily used tables and ways to run all this inside RAM to dramatically pull down the resource usage needed.
and for such task you have to be living and breathing INNODB person
So I've got a MySQL database for an web community that is a potential stats goldmine. Currently I'm serving stats built via all sorts of nasty queries on my well-normalized database. I've run into the "patience limit" for such queries on my shared hosting, and would like to move to data warehousing and a daily cron job, thereby sacrificing instant updates for a 100-fold increase in statistical depth.
I've just started reading about data warehouses, and particularly the star schema, and it all seems pretty straight-forward.
My question essentially is - should I toss all that crap into a new database, or just pile the tables into my existing MySQL database? The current database has 47 tables, the largest of which has 30k records. I realize this is paltry compared to your average enterprise application, but your average enterprise application does not (I hope!) run on shared-hosting!
So, keeping my hardware limits in mind, which method would be better?
I really don't know much about this at all, but I assume reading Table A, calculating, then updating Table B is a lot easier in the same database than across databases, correct?
Should I even care how many tables my DB has?
If you just need to improve performance, you should just create a set of pre-cocked reporting tables. Low effort and big performance gains. With the data volume you described, this won't even have an noticable impact on the users of your web community.
The different database approach has several benefits (see below) but I don't think you will gain any of them as you are on a shared database host.
You can support different SLA for DW and web site
DW and Web database can have different configurations
DW database is basically read-only for large portion of the day
DW and Web database can have different release cycles (this is big)
Typical DW queries (large amount of data) don't kill the cache for web DB.
The number of tables in a particular database does not usually become a problem until you have thousands (or tens of thousands) of tables, and these problems usually come into play due to filesystem limits related to the maximum number of files in a directory.
You don't say what storage engine you are using. In general, you want the indexes in your database to fit into memory for good insert/update/delete performance, so the size of your key buffer or buffer pool must be large enough to hold the "hot" part of the index.
Hi Friends
i am using MySQL DB for one of my Product, about 250 schools are singed for it now, its about 1500000 insertion per hour and about 12000000 insertion per day, i think my current setup like just a single server may crash with in hours, and the read is also same as write, how can i make it crash free DB server, the main problem i am facing now is the slow of both writing and reading data how can i over come that,it is very difficult for me to get a solution.guys please help me..which is the good model for doing the solution?
It is difficult to get both fast reads and writes simultaneously. To get fast reads you need to add indexes. To get fast writes you need to have few indexes. And to get both to be fast they must not lock each other.
Depending on your needs, one solution is to have two databases. Write new data to your live database and every so often when it is quiet you can synchronize the data to another database where you can perform queries. The disadvantage of this approach is that data you read will be a little old. This may or may not be a problem depending on what it is you need to do.
~500 inserts per second is nothing to sneeze at indeed.
For a flexible solution, you may want to implement some sort of sharding. Probably the easiest solution is to separate schools into groups upfront and store data for different groups of schools on different servers. E.g., data for schools 1-10 is stored on server A, schools 11-20 on server B, etc. This is almost infinitely scalable, assuming that there are few relationships between data from different schools.
Also you could just try throwing more horsepower at the problem and invest into a RAID of SSD drives and, assuming that you have enough processing power, you should be OK. Of course, if it's a huge database, the capacity of SSD drives may not be enough.
Finally, see if you can cut down on the number of insertions, for example by denormalizing the database. Say, instead of storing attendance for each student in a separate row put attendance of the entire class as a vector in a single row. Of course, such changes will heavily limit your querying capabilities.
My laid back advice is:
Build you application lightweight. Don't use an high level database abstraction layer like Active Record. They suck at scaling.
Learn a lot about mysql permformance.
Learn about mysql replication.
Learn about load balancing.
Learn about in memory caches. (memcached)
Hire an administrator (with decent mysql knowledge) or web app performance guru/consultant.
The concrete strategy depends on your application and how it is used. Mysql replication, may or may not be appropriate (same applies for the mentioned sharding strategy). But it's a rather simple way to achive some scaling, because it doesn't impact your application design too much. In memory caches can keep away some load from your databases, but they need some work to apply and some trade offs. In the end you need a good overall understanding how to handle a database driven application under heavy load. If you have a tight deadline, add external manpower, because you won't do this right within 6 weeks without experience.