We have a VPS with around 40 databases and 1 of them is getting very high traffic (selects)... is there any tool that i can install to keep a record or visualize what database is getting the most traffic/hits???
Thanks
MySQL Enterprise Monitor
Advanced Query Analyzer - Provides multiple options for quickly finding worst offenders. Collected queries can be sorted and filtered by server, database, query type, query content. Analysis can be done by current time intervals or by historical date/time range.
MySQL and OS Metric Graphs - Allows DBA to monitor and correlate these metrics: Database Activity (Selects, Inserts, Updates, Deletes)
I believe it has a free trial.
Related
Suituation:
Client is running a web based finance application, where the primary functionaities includes huge volume of financial transactions both in and out.
The processes are automated.
We run several cron job tasks at midnight to split the payments for appropriate customers.
Monthly on average we have 2000 to 3000 new customers with total of 30,000 customers currently.
Our transactional tables has almost 900000 records so far and expect drastic increase in comming months.
Technologies: Initially we used LAMP environment, With Codeignitor framework, Laravel elequont ORM for querying and Mysql.
Hosting: Hosted in AWS, T2 small instance, no load balancer implemented.
**This application was developed three years back.
Problem:
Currenty our client faces downtime during peak hours and also their customers faces load time issues while reviewing their transaction archives and stats.
And also they fear in case if the cron job tasks fails, they could not able to handle the suituation. (vast calculations are made and amounts were inserted accross huge volume of customers).
Our plan:
So right now, we planned to rework on the application from scratch with performance and fault tolerance as our primary goal. And this application has to be reliable at least for another
six to eight years.
Technologies: Node (Sails.js), Angular 5, AWS with load balancer, AWS RDS (Mysql)
Our approach: From our analysis, we gained few straight forward reasons for the performance loss. Primarly, there are many stats for customers which access heavy tables.
Most of the stats are on current month. So we plan to add log tables for such and keep only the current month data in the specific table.addMethod
So, there are going to be may such log table which will only going to have read operation.
Queries:
Is it good to split the ready only tables to separate database or can we have it within the single database.
How Mysql buffer cache differ from Redis / memcache, Is there any memory consumption problem occurs while more traffic flows in?
What is the best approach to truncate few tables at the end of evey month (As i mentioned about log file)?
Am I proceeding in right direction?
A million rows is a modest size, not "huge". Since you are having performance problems, I have to believe that it stems from poor indexing and/or poor query formulation.
Find out what queries are having the most trouble. See this for suggestions on using mysqldumpslow -s t or pt-query-digest to locate them.
Provide SHOW CREATE TABLE and EXPLAIN SELECT ... for discussion of how to improve them. It may be as simple as adding a "composite" index.
Another possible performance bottleneck may be repeatedly summarizing old data. If this is the case, then consider the Data Warehousing technique of _building and maintaining Summary Tables .
As for your 4 questions, I tentatively say "no" to each.
The various frameworks tend to make small applications easy to develop, but they start to give trouble when you scale. Still, there are things that can be fixed without abandoning (yet) the frameworks.
AWS, etc, give you lots of reliability and read scaling. But, I repeat, the likely place to look is at the slow queries, not the various ideas you presented.
As for periodic truncation, let's discuss that after seeing what the data looks like and what the business requirements are for data retention.
Just some contexts: In our old data pipeline system, we are running MySQL 5.6. or Aurora on Amazon rds. Bad thing about our old data pipeline is running a lot of heavy computations on the database servers because we are handcuffed by what was designed: treating transactional databases as data warehouse and our backend API directly “fishing” the databases heavily in our old system. We are currently patching this old data pipeline, while re-design the new data warehouse in SnowFlake.
In our old data pipeline system, the data pipeline calculation is a series of sequential MySQL queries. As our data grows bigger and bigger in the old data pipeline, what the problem now is the calculation might just hang forever at, for example, the step 3 MySQL query, while all metrics in Amazon CloudWatch/ grafana we are monitoring (CPU, database connections, freeable memory, network throughput, swap usages, read latency, available storage, write latency, etc. ) looks normal. The MySQL slow query log is not really helpful here because each of our query in the data pipeline is essentially slow anyway (can takes hours to run a query because the old data pipeline is running a lot of heavy computations on the database servers). The way we usually solve these problems is to “blindly” upgrade the MySQL/Aurora Amazon rds service and hoping it will solve the issue. I am wondering
(1) What are the recommended database metrics in MySQL 5.6. or Aurora on Amazon rds we should monitor real-time to help us identify why a query freezes forever? Like innodb_buffer_pool_size?
(2) Is there any existing tool and/or in-house approach where we can predict how many hardware resources we need before we can confidently execute a query and know it will succeed? Could someone share some 2 cents?
One thought: Since Amazon rds sometimes is a bit blackbox, one possible way is to host our own MySQL server on an Amazon EC2 instance in parallel to our Amazon MySQL 5.6/Aurora rds production server, so we can ssh into MySQL server and run a lot of command tools like mytop (https://www.tecmint.com/mysql-performance-monitoring/) to gather a lot more real time MySQL metrics which can help us triage the issue. Open to any 2 cents from gurus. Thank you!
None of the tools mentioned at that link should need to run on the database server itself, and to the extent that this is true, there should be no difference in their behavior if they aren't. Run them on any Linux server, giving the appropriate --host and --user and --password arguments (in whatever form they may expect). Even mysqladmin works remotely. Most of the MySQL command line tools do (such as the mysql cli, mysqldump, mysqlbinlog, and even mysqlcheck).
There is no magic coupling that most administrative utilities can gain by running on the same server as MySQL Server itself -- this is a common misconception but, in fact, even when running on the same machine, they still have to make a connection to the server, just like any other client. They may connect to the unix socket locally rather than using TCP, but it's still an ordinary client connection, and provides no extra capabilities.
It is also possible to run an external replica of an RDS/MySQL or Aurora/MySQL server on your own EC2 instance (or in your own data center, even). But this isn't likely to tell you a whole lot that you can't learn from the RDS metrics, particularly in light of the above. (Note also, that even replica servers acquire their replication streams using an ordinary client connection back to the master server.)
Avoid the temptation to tweak server parameters. On RDS, most of the defaults are quite sane, and unless you know specifically and precisely why you want to adjust a parameter... don't do it.
The most likely explanation for slow queries... is poorly written queries and/or poorly designed indexes.
If you are not familiar with EXPLAIN SELECT, then you need to learn it, live it, an love it. SQL is declarative, not procedural. That is, SQL tells the server what you want -- not specifically how to obtain it internall. For example: SELECT ... FROM x JOIN y tells the server to match up the rows from table x and y ON a certain criteria, but does not tell the server whether to read from x then find the matching rows in y... or read from y and find the matching rows in x. The net result is the same either way -- it doesn't matter which table the server examines first, internally -- but if the query or the indexes don't allow the server to correctly deduce the optimum path to the results you've requested, it can spend countless hours churning through unnecessary effort.
Take for an extreme and overly-simplified example, a table with millions of rows and table with 1 row. It would make sense to read the small table first, so you know what 1 value you're trying to join in the large table. It would make no sense to read throuh each row in the large table, then go over and check the small table for a match for each of the millions of rows. The order in which you join tables can be different than the order in which the actual joining is done.
And that's where EXPLAIN comes in. This allows you to inspect the query plan -- the strategy the internal query optimizer has concluded will get it to the answer you need with the least amount of effort. This is the core of the magic of relational database systems -- finding the correct solution in the optimal time, based on what it knows about the data. EXPLAIN shows you the order in which the tables are being accessed, how they're being joined, which indexes are being used, and an estimate of the number of rows from each table are involved -- and these numbers multiply together to give you an estimate of the number of permutations involved in resolving your query. Two small tables, each with 50,000 rows, joined without a proper index, means an entirely unreasonable 2,500,000,000 unique combinations between the two tables that must be evaluated; every row must be compared to every other row. In short, if this turns out to be the kind of thing that you are (unknowingly) asking the server to do, then you are definitely doing something wrong. Inspecting your query plan should be second nature any time you write a complex query, to ensure that the server is using a sensible strategy to resolve it.
The output is cryptic, but secret decoder rings are available.
https://dev.mysql.com/doc/refman/5.7/en/explain.html#explain-execution-plan
We are planning to rewrite legacy system that is using MySQL InnoDB database and trying to analyse main bottlenecks that should be avoided in next version.
System has many services/jobs that runs over night that generates data - inserts/updates, that mainly should be optimized. Jobs runs avg. 2-3 hours now.
We already gathered long running queries that must be optimized.
But I am wondering if it is possible to gather information and statistics about long running transactions.
Very helpful will be information which tables is locked by transaction the most - average locking time, lock type, periods.
Could somebody advice any tool or script that can gather such information?
Or maybe someone can share own experience in database analyse and optimization?
MySQL has built in capability for capturing "slow" query statistics (but to get an accurate picture you need to set the slow threshold as 0). You can turn the log into useful information with mysqldumpslow (bundled with mysql). I like the percona toolkit, but there are lots of other tools available.
We have to design an SQL Server 2008 R2 database storing many varbinary blobs.
Each blob will have around 40K and there will be around 700.000 additional entries a day.
The maximum size of the database estimated is 25 TB (30 months).
The blobs will never change. They will only be stored and retrieved.
The blobs will be either deleted the same day they are added, or only during cleanup after 30 months. In between there will be no change.
Of course we will need table partitioning, but the general question is, what do we need to consider during implementation for a functioning backup (to tape) and restore strategy?
Thanks for any recommendations!
Take a look at the "piecemeal backup and restore" - you will find it very useful for your scenario, which would benefit from different backup schedules for different filegroups/partitions. Here are a couple of articles to get you started:
http://msdn.microsoft.com/en-us/library/ms177425(v=sql.120).aspx
http://msdn.microsoft.com/en-us/library/dn387567(v=sql.120).aspx
I have had the pleasure in the past of working with several very large databases, the largest environment I have worked with being in the 5+ TB range. Going even larger than that, I am sure that you will encounter some unique challenges that I may not have faced.
What I can say for sure is that any backup strategy that you are going to implement is going to take awhile, so you should plan to have at least one day a week devoted to backups and maintenance where the database while available should not be expected to perform at the same levels.
Second, I have found the following MVP article to be extremely useful in planning backups which are taken through the native MSSQL backup operations. There are some large database specific options for the backup command which could assist in reducing your backup duration. While these increase throughput, you can expect performance impact. Specifically the options that had the greatest impact in my testing is buffercount, blocksize, and maxtransfersize.
http://henkvandervalk.com/how-to-increase-sql-database-full-backup-speed-using-compression-and-solid-state-disks
Additionally, assuming your data is stored on a SAN, you may wish as an alternative to investigate the use of SAN level tools in your backup strategy. Some SAN vendors provide software which integrates with SQL Server to perform SAN style snapshot backups while still integrating with the engine to handle things like marking backup dates and forwarding LSN values.
Based on your statement that the majority of the data will not change over time, inclusion of differential backups seems like a very useful option for you allowing you to reduce the number of transaction logs which would be have to be restored in a recovery scenario.
Please feel free to get in touch with me directly if you would like to discuss further.
In a scenario with a database containing hundreds of millions of rows and reaching sizes of 500GB with maybe ~20 users. Mostly it's data storage for aggregated data to be reported on later.
Would SQL Azure be able to handle this scenario? If so, does it make sense to go that route? Compared to purchasing and housing 2+ high end servers ($15k-$20k each) in a co-location facility + all maintenance and backups.
Did you consider using the Azure Table storage? Azure Tables do not have referential integrity, but if you are simply storing many rows, is that an option for you? You could use SQL Azure for your transactional needs, and use Azure Tables for those tables that do not fit in SQL Azure. Also, Azure Tables will be cheaper.
SQL Azure databases are limited to 50Gb (at the moment)
As described in the General Guidelines and Limitations
I don't know whether SQL Azure is able to handle your scenario - 500GB seems a lot and does not figure in the pricing list (50GB max). I'm just trying to give perspective about the pricing.
Official pricing of SQL Azure is around 10$ a GB/month ( http://www.microsoft.com/windowsazure/pricing/)
Therefore, 500 GB would be around 5k $ each month roughly. 2 high-end servers (without license fees, maintenance and backups) of 20k take about 8 months to pay off.
Or, from an other point of view: Assuming you change your servers every 4 years, does the budget of 240k $ (5k $ * 48 months) cover the hardware, installation/configuration, licence fees and maintenance costs? (Not counting bandwidth and backup since you'll pay that extra too when using SQL Azure).
One option would be to use SQL Azure sharding. This is a way to spread the data over multiple SQL Azure databases and has the advantage that each database would use a different CPU & hard drive (since each db is actually stored on a different machine in the data center) which should give you really good performance. Of course, this is under the assumption that your database has the possibility of being sharded. There is some more info on this sharding here.