the question is about the best practice.
How to perform a reliable SQL query test?
That is the question is about optimization of DB structure and SQL query itself not the system and DB performance, buffers, caches.
When you have a complicated query with a lot of joins etc, one day you need to understand how to optimize it and you come to EXPLAIN command (mysql::explain, postresql::explain) to study the execution plan.
After tuning the DB structure you execute the query to see any performance changes but here you're on the pan of multiple level of optimization/buffering/caching. How to avoid this? I need the pure time for the query execution and be sure it is not affected.
If you know different practise for different servers please specify explicitly: mysql, postgresql, mssql etc.
Thank you.
For Microsoft SQL Server you can use DBCC FREEPROCCACHE (to drop compiled query plans) and DBCC DROPCLEANBUFFERS (to purge the data cache) to ensure that you are starting from a completely uncached state. Then you can profile both uncached and cached performance, and determine your performance accurately in both cases.
Even so, a lot of the time you'll get different results at different times depending on how complex your query is and what else is happening on the server. It's usually wise to test performance multiple times in different operating scenarios to be sure you understand what the full performance profile of the query is.
I'm sure many of these general principles apply to other database platforms as well.
In the PostgreSQL world you need to flush the database cache as well as the OS cache as PostgreSQL leverages the OS caching system.
See this link for some discussions.
http://archives.postgresql.org/pgsql-performance/2010-08/msg00295.php
Why do you need pure execution time? It depends on so many factors and almost meaningless on live server. I would recommend to collect some statistic from live server and analyze queries execution time using pgfouine tool (it's for postgresql) and make decisions based on it. You will see exactly what do you need to tune and how effective was your changes on a report.
Related
We are in the process of designing a new system that will either use MySQL or Postgres depending upon the performance.But there are several problems in doing a realistic comparison.I have summed up some of them,it would be helpful if some experts threw some wisdom in here.
Using a neutral performance testing tool
There is something for postgres called explain analyze which basically gives all the details necessary to optimize on the database side.But MySQL does not have something which is as detailed as this one.
Of course these commands give info about a single query, real time performance involves bigger workloads on how the application is going to receive.
How much of this is true ? If a query is slower in postgres and faster in MySQL will it be faster in postgres over heavier workloads, of course only real time tests can tell,but is it worth going in this direction?
I am familiar with Jmeter, but are there any other better tools to do such tasks.
Optimization of both the databases
Postgres is said to be slower for simple reads, but scales well as the data grows and for more complex workloads.Taken from here and here.
With that said,how much optimisation is necessary so that the tests are fair to both database systems.
Any additional points are also welcome.
size of data will have more significance than workload, resource (memory) tuning can have a big effect too.
"With that said,how much optimisation is necessary so that the tests are fair to both database systems."
Is seems to me that the only way to be fair is to do real-world optimisation. Optimise your test systems to as close to production as you can justify. if you're not going to be writing SQL both are going to perform about the same. (+/- $1000 worth of server hardware)
if you're writing SQL you want to keep the programmers happy. ($10000 of programmers won't get you much more performance)
The only realistic performance comparison is with the system that you are designing. Why don't you make your system to be configurable to use either MySql or PostgreSQL then run load tests against it with both databases and compare the performance results? That is what I did in comparing MySql vs PostgreSQL vs Docker in this open source news feed micro-service.
I'm thinking about moving our production env from a self hosted solution to amazon aws. I took a look at the different services and thought about using RDS as replacement for our mysql instances. The hardware we're using for our master seems to be better than the best hardware we can get when using rds (Quadruple Extra Large DB Instance). Since I can't simply move our production env to aws and see if the performance is still good enough I'd love to make some tests in advance.
I thought about creating a full query log from our current master, configure the rds instance and start to replay the full query log against it. Actually I don't even know if this kind of testing is a good idea but I guess you'll tell me if there are better ways to make sure the performance of mysql won't drop dramatically when making the move to rds.
Is there a preferred tool to replay the full query log?
at what metrics should I take a look while running the test
cpu usage?
memory usage?
disk usage?
query time?
anything else?
Thanks in advance
I'd recommend against replaying the query log - it's almost certainly not going to give you the information you want, and will take a significant amount of effort.
Firstly, you'd need to prepare your database so that replaying the query log won't break constraints when inserting, updating or deleting data, and that subsequent "select" queries will find the records they should find. This is distinctly non-trivial on anything other than a toy database - just taking a back-up and replaying the log doesn't necessarily guarantee the ordering of DML statements will match what happened on production. This may well give you a false sense of comfort - all your select statements return in a few milliseconds, because the data they're looking for doesn't exist!
Secondly, load and performance testing rarely works by replaying what happened on production - that doesn't (usually) reflect the peak conditions that will bring your system to its knees. For instance, most production systems run happily most of the time at <50% capacity, but go through spikes during the day, when they might reach 80% or more of capacity - that's what you care about, can your new environment handle the peaks.
My recommendation would be to use a tool like JMeter to write performance scripts (either directly to the database using the JDBC driver, or through the front end if you've got a web appilcation). Your performance scripts should reflect the behaviour you see from users, and be parameterized so they're not dependent on the order in which records are created.
Set yourself some performance targets (ideally based on current production levels, with a multiplier to cover you against spikes), e.g. "100 concurrent users, with no query taking more than 1 second"), and use JMeter to simulate that load. If you reach it first time, congratulations - go home! If not, look at the performance counters to see where the bottleneck is; see if you can alleviate that bottleneck (or tune your queries, your awesome on-premise hardware may be hiding some performance issues). Typical bottlenecks are CPU, RAM, and disk I/O.
Experiment with different test scenarios - "lots of writes", "lots of reads", "lots of reporting queries", and mix them up.
The idea is to understand the bottlenecks on the system, and see how far you are from those bottleneck, and understand what you can do to alleviate them. Once you know that, your decision to migrate will be far more robust.
I have developed some mysql queries for my application, and created indexes and used EXPLAIN statements as well.
What types of testing methods we can use to check queries (performance testing, load testing, concurrency testing and others)
How to use those testing methods in your system, anything related to query testing is helpful to me.
Thanks in advance.
Test with sysbench , it is a great tool for simulating database traffic. Furthermore I would suggest getting familiar with the MySQL EXPLAIN statement as it helps to dissect a query and improve on slow areas.
There are quite a few articles on the internet that explain how to properly benchmark, here is just one of them http://20bits.com/articles/10-tips-for-optimizing-mysql-queries-that-dont-suck/ .
Last but not least there is no substitute for testing with real data. Theoretically speaking certain queries should handle better than others, but the only sure way to know this is by testing your schema with actual data. There is a handly tool named generatedata that creates a lot of dummy data so that you can perform said tests.
In order to properly benchmark your queries you must ensure any cached queries and database information is wiped so that the result times are accurate and independent of one-another; you can do this by performing a RESET QUERY CACHE and FLUSH TABLES before running each query.
Additional information as requested:
From experience the best way to handle concurrency is by using the MySQL SET TRANSACTION statement to properly isolate your queries. By using the InnoDB engine the database will perform row locking which is often sufficient for most applications. You can test this by performing equivalent tasks on the database but with separate transactions. Concurrency is a very broad topic in the database world and I would highly recommend further researching this topic.
I use MySQL (Percona ExtraDB 5.1 to be exact) as my database of choice. Overall, very impressed with performance. The applications that use it are quite large.
We believe that a query is sometimes causing a backup of threads on the database for whatever reason (i.e., memory/buffers). The server has been tweaked countless times to prevent this so it's literally a 1% problem now, but still very annoying. Unless you are monitoring the database server 24/7 you are unlikely to ever see the cause of the backup.
Is there any recommendation (apart from going through the slow query log) which anyone can suggest to track the problematic queries (i.e., reporting via the application)?
Percona Server with XtraDB actually logs both the timestamp and the execution time in microsecond resolution, so you can find the start and the end of the queries precisely. However, log analysis is probably the wrong approach. You probably need to use Aspersa's stalk+collect tools.
As you point out in your question, your best bet will be the slow query log:
http://dev.mysql.com/doc/refman/5.5/en/slow-query-log.html
You might also want to log this at the app level:
At the beginning of your scripts, keep a note of what you're about to do and when it started. At the end of it, log this information if the time spent processing the request is higher than a certain threshold.
That way, you'll be able to identify problematic sequences of queries rather than individual queries. (Which, incidentally, might reveal that no individual query is slow but some requests might fire gazillions of small queries.)
Have a look at this script which allows you to extract a more abstracted representations of the queries causing the problems.
I usually sort the list by the product of frequency and runtime to get the queries causing the most problems.
NB recording the actual start and end of the queries is irrelevant to measuring the queries actually causing locks - from the manual "The time to acquire the initial table locks is not counted as execution time"
You just need to fix the slow stuff.
I am about to begin developing a logging system for future implementation in a current PHP application to get load and usage statistics from a MYSQL database.
The statistic will later on be used to get info about database calls per second, query times etc.
Of course, this will only be used when the app is in testing stage, since It will most certainly cause a bit of additional load itself.
However, my biggest questionmark right now is if i should use MYSQL to log the queries, or go for a file-based system. I'll guess that it would be a bit of a headache to create something that would allow writings from multiple locations when using a file based system to handle the logs?
How would you do it?
Use the general log, which will show client activity, including all the queries:
http://dev.mysql.com/doc/refman/5.1/en/query-log.html
If you need very detailed statistics on how long each query is taking, use the slow log with a long_query_time of 0 (or some other sufficiently short time):
http://dev.mysql.com/doc/refman/5.1/en/slow-query-log.html
Then use http://www.maatkit.org/ to analyze the logs as needed.
MySQL already had logging built in- Chapter 5.2 of the manual describes these. You'll probably be interested in The General Query Log (all queries), the Binary Query Log (queries that change data) and the Slow log (queries that take too long, or don't use indexes).
If you insist on using your own solution, you will want to write a database middle layer that all your DB calls go through, which can handle the timing aspects. As to where you write them, if you're in devel, it doesn't matter too much, but the idea of using a second db isn't bad. You don't need to use an entirely separate DB, just as far as using a different instance of MySQL (on a different machine, or just a different instance using a different port). I'd go for using a second MySQL instance instead of the filesystem- you'll get all your good SQL functions like SUM and AVG to parse your data.
If all you are interested in is longer-term, non-real time analysis, turn on MySQL's regular query logging. There are tons of tools for doing analysis on the query-logs (both regular and slow-query), giving you information about the run-times, average rows returned, etc. Seems to be what you are looking for.
If you are doing tests on MySQL you should store the results in a different database such as Postgres, this way you won't increase the load with your operations.
I agree with macabail but would only add that you could couple this with a cron job and a simple script to extract and generate any statistics you might want.