Must-have features for F/OSS SQL analyzer/optimizer

Must-have features for F/OSS SQL analyzer/optimizer - mysql

I like MySQL's Query Analyzer...but not the price tag. I think I can write something myself to do analysis on slow query logs, indexes, table status fields, etc. and offer it as an alternative, F/OSS solution.
What would be your top-requested features for such a solution?

Freeware
Scalable (multi computer database)
Multiprocess
Query Comparer among databases (transfer schema to eg: Postgresql then run the query there too)
Import txt/sql/server logs and pick up queries from the log for analyze
Analyze w simulation of server load / query analyzer during ~duress (low memory, low tmp, high processor usage)
Analyze under different SQL server configuration profiles (auto change config file, restart etc...)
A tunnel script file (php,cgi) to run the analyze on a server without public connection access.

Ability to manage multiple connections to different servers / profiles
Maybe nice to have:
"On the fly" mode: Throw in a table declaration and a query, software creates table in temporary database and explains query

Maatkit's mk-query-digest does most of this and is open-source. It's not a GUI, but I find the data more useful and flexible than the query analyzer. Using the tcpdump mode provides much of what query analyzer provides without the proxy overhead.
Perhaps looking at integrating Maatkit or at least Maatkit's ideas into a GUI would be useful.

Related

Amazon RDS MySQL/Aurora query sometimes hangs forever. Any 2 cents on the metrics and approaches we can triage it and prevent it from happening?

Just some contexts: In our old data pipeline system, we are running MySQL 5.6. or Aurora on Amazon rds. Bad thing about our old data pipeline is running a lot of heavy computations on the database servers because we are handcuffed by what was designed: treating transactional databases as data warehouse and our backend API directly “fishing” the databases heavily in our old system. We are currently patching this old data pipeline, while re-design the new data warehouse in SnowFlake.
In our old data pipeline system, the data pipeline calculation is a series of sequential MySQL queries. As our data grows bigger and bigger in the old data pipeline, what the problem now is the calculation might just hang forever at, for example, the step 3 MySQL query, while all metrics in Amazon CloudWatch/ grafana we are monitoring (CPU, database connections, freeable memory, network throughput, swap usages, read latency, available storage, write latency, etc. ) looks normal. The MySQL slow query log is not really helpful here because each of our query in the data pipeline is essentially slow anyway (can takes hours to run a query because the old data pipeline is running a lot of heavy computations on the database servers). The way we usually solve these problems is to “blindly” upgrade the MySQL/Aurora Amazon rds service and hoping it will solve the issue. I am wondering
(1) What are the recommended database metrics in MySQL 5.6. or Aurora on Amazon rds we should monitor real-time to help us identify why a query freezes forever? Like innodb_buffer_pool_size?
(2) Is there any existing tool and/or in-house approach where we can predict how many hardware resources we need before we can confidently execute a query and know it will succeed? Could someone share some 2 cents?
One thought: Since Amazon rds sometimes is a bit blackbox, one possible way is to host our own MySQL server on an Amazon EC2 instance in parallel to our Amazon MySQL 5.6/Aurora rds production server, so we can ssh into MySQL server and run a lot of command tools like mytop (https://www.tecmint.com/mysql-performance-monitoring/) to gather a lot more real time MySQL metrics which can help us triage the issue. Open to any 2 cents from gurus. Thank you!

None of the tools mentioned at that link should need to run on the database server itself, and to the extent that this is true, there should be no difference in their behavior if they aren't. Run them on any Linux server, giving the appropriate --host and --user and --password arguments (in whatever form they may expect). Even mysqladmin works remotely. Most of the MySQL command line tools do (such as the mysql cli, mysqldump, mysqlbinlog, and even mysqlcheck).
There is no magic coupling that most administrative utilities can gain by running on the same server as MySQL Server itself -- this is a common misconception but, in fact, even when running on the same machine, they still have to make a connection to the server, just like any other client. They may connect to the unix socket locally rather than using TCP, but it's still an ordinary client connection, and provides no extra capabilities.
It is also possible to run an external replica of an RDS/MySQL or Aurora/MySQL server on your own EC2 instance (or in your own data center, even). But this isn't likely to tell you a whole lot that you can't learn from the RDS metrics, particularly in light of the above. (Note also, that even replica servers acquire their replication streams using an ordinary client connection back to the master server.)
Avoid the temptation to tweak server parameters. On RDS, most of the defaults are quite sane, and unless you know specifically and precisely why you want to adjust a parameter... don't do it.
The most likely explanation for slow queries... is poorly written queries and/or poorly designed indexes.
If you are not familiar with EXPLAIN SELECT, then you need to learn it, live it, an love it. SQL is declarative, not procedural. That is, SQL tells the server what you want -- not specifically how to obtain it internall. For example: SELECT ... FROM x JOIN y tells the server to match up the rows from table x and y ON a certain criteria, but does not tell the server whether to read from x then find the matching rows in y... or read from y and find the matching rows in x. The net result is the same either way -- it doesn't matter which table the server examines first, internally -- but if the query or the indexes don't allow the server to correctly deduce the optimum path to the results you've requested, it can spend countless hours churning through unnecessary effort.
Take for an extreme and overly-simplified example, a table with millions of rows and table with 1 row. It would make sense to read the small table first, so you know what 1 value you're trying to join in the large table. It would make no sense to read throuh each row in the large table, then go over and check the small table for a match for each of the millions of rows. The order in which you join tables can be different than the order in which the actual joining is done.
And that's where EXPLAIN comes in. This allows you to inspect the query plan -- the strategy the internal query optimizer has concluded will get it to the answer you need with the least amount of effort. This is the core of the magic of relational database systems -- finding the correct solution in the optimal time, based on what it knows about the data. EXPLAIN shows you the order in which the tables are being accessed, how they're being joined, which indexes are being used, and an estimate of the number of rows from each table are involved -- and these numbers multiply together to give you an estimate of the number of permutations involved in resolving your query. Two small tables, each with 50,000 rows, joined without a proper index, means an entirely unreasonable 2,500,000,000 unique combinations between the two tables that must be evaluated; every row must be compared to every other row. In short, if this turns out to be the kind of thing that you are (unknowingly) asking the server to do, then you are definitely doing something wrong. Inspecting your query plan should be second nature any time you write a complex query, to ensure that the server is using a sensible strategy to resolve it.
The output is cryptic, but secret decoder rings are available.
https://dev.mysql.com/doc/refman/5.7/en/explain.html#explain-execution-plan

History of queries in MySql

Is there any way to check the query that occurs in my MySql database?
For example:
I have an application (OTRS) that allows you to generate reports according to the frames that I desire. I would like to know which query is made by the application in the database.
Because I will use it to integrate with other reporting software.
Is this possible?

Yes, you can enable logging in your MySQL server. there are several types of logs you can use, depending on what you want to log, starting from errors only or slow queries, and to logs that write everything done on your server.
See the full doc here

Although, as Nir says, mysql can log all queries (you should be looking at the general log or the slow log configured with a threshold of 0 seconds) this will show all the queries being run; on a production system it may prove difficult to match what you are doing in your browser with specific entries in the log.
The reason I suggest using the slow query log is that there are tools available which will remove the parameters from the queries, allowing you to see what SQL code is running more frequently.
If you have some proficiency in Perl it should be straightforward to output - all queries are processed via an abstraction layer.
(Presumably you are aware that the schema is published)

SQLite faster than MySQL?

I want to set up a teamspeak 3 server. I can choose between SQLite and MySQL as database. Well I usually tend to "do not use SQLite in production". But on the other hand, it's a teamspeak server. Well okay, just let me google this... I found this:
Speed
SQLite3 is much faster than MySQL database. It's because file database is always faster than unix socket. When I requested edit of channel it took about 0.5-1 sec on MySQL database (127.0.0.1) and almost instantly (0.1 sec) on SQLite 3. [...]
http://forum.teamspeak.com/showthread.php/77126-SQLite-vs-MySQL-Answer-is-here
I don't want to start a SQLite vs MySQL debate. I just want to ask: Is his argument even valid? I can't imagine it's true what he says. But unfortunately I'm not expert enough to answer this question myself.
Maybe TeamSpeak dev's have some major differences in their db architecture between SQLite and MySQL which explains a huge difference in speed (I can't imagine this).

At First Access Time will Appear Faster in SQLite
The access time for SQLite will appear faster at first instance, but this is with a small number of users online. SQLite uses a very simplistic access algorithm, its fast but does not handle concurrency.
As the database starts to grow, and the amount of simultaneous access it will start to suffer. The way servers handle multiple requests is completely different and way more complex and optimized for high concurrency. For example, SQLite will lock the whole table if an update is going on, and queue the orders.
RDBMS's Makes a lot of extra work that make them more Scalable
MySQL for example, even with a single user will create an access QUEUE, lock tables partially instead of allowing only single user-per time executions, and other pretty complex tasks in order to make sure the database is still accessible for any other simultaneous access.
This will make a single user connection slower, but pays off in the future, when 100's of users are online, and in this case, the simple
"LOCK THE WHOLE TABLE AND EXECUTE A SINGLE QUERY EACH TIME"
procedure of SQLite will hog the server.
SQLite is made for simplicity and Self Contained Database Applications.
If you are expecting to have 10 simultaneous access writing at the database at a time SQLite may perform well, but you won't want an 100 user application that constant writes and reads data to the database using SQLite. It wasn't designed for such scenario, and it will trash resources.
Considering your TeamSpeak scenario you are likely to be ok with SQLite, even for some business it is OK, some websites need databases that will be read only unless when adding new content.
For this kind of uses SQLite is a cheap, easy to implement, self contained, perfect solution that will get the job done.

The relevant difference is that SQLite uses a much simpler locking algorithm (a simple global database lock).
Using fine-grained locking (as MySQL and most other DB servers do) is much more complex, and slower if there is only a single database user, but required if you want to allow more concurrency.

I have not personally tested SQLite vs MySQL, but it is easy to find examples on the web that say the opposite (for instance). You do ask a question that is not quite so religious: is that argument valid?
First, the essence of the argument is somewhat specious. A Unix socket would be used to communicate to a database server. A "file database" seems to refer to the fact that communication is through a compiled-in interface. In the terminology of SQLite, it is server-less. Most databases store data in files, so the terminology "file database" is a little misleading.
Performance of a database involves multiple factors, such as:
Communication of query to the database.
Speed of compilation (ability to store pre-compiled queries is a plus here).
Speed of processing.
Ability to handle complex processing.
Compiler optimizations and execution engine algorithms.
Communication of results back to the application.
Having the interface be compiled-in affects the first and last of these. There is nothing that prevents a server-less database from excelling at the rest. However, database servers are typically millions of lines of code -- much larger than SQLite. A lot of this supports extra functionality. Some of it supports improved optimizations and better algorithms.
As with most performance questions, the answer is to test the systems yourself on your data in your environment. Being server-less is not an automatic performance gain. Having a server doesn't make a database "better". They are different applications designed for different optimization points.

In short:
For Local application databses, single user applications, and little simple projects keeping small data SQLite is winner.
For Network database applications, multiuser and concurrency, load balancing and growing data managements, security and roll based authentications, big projects and widely used services you should choose MySql.
In your question I do not know much about teamspeak servers and what kind of data it actually needs to keep in its database but if it just needs a local DBMS and not needs to proccess lots of concurrency and managements SQLite will be my choice.

How to calculate or see the performance of my database in mysql?

I want to check the performance of my database in mysql. I googled and came to know about show full processlist etc commands, but not very clear. i just want to know and calaculate the performance of database in terms of how much heap memory it is taking and other such.
Is there any way to know and assess the performance of the database. so that I can optimize and improve the performance.
Thanks in advance

The basic tool is MySQL Workbench which will work with any recent version of MySQL. It's not as powerful as the enterprise version, but is a great place to start.
The configuration can be exposed with SHOW VARIABLES and the current state of the system is exposed through SHOW STATUS. These status numbers are what ends up being graphed in most tools.
Don't forget that you can do a lot of monitoring on the application side, turning on database logs for instance. Barring that you can enable the "slow query" log in MySQL to check which queries are having the most impact. These can then be diagnosed with EXPLAIN.

Download mysql enterprise tools. They will allow you to monitor load on the server as well as performance of individual queries.

You can use open source tools from Percona called as Percona Toolkit and start using some useful tools which can help you in Efficiently archive rows, Find duplicate indexes, Summarize MySQL servers, Analyze queries from logs and tcpdump and Collect vital system information when problems occur.
You can try experimenting with Performance_Schema tables avialable in MySQL v5.6 onwards which can give a detailed information of query, database statistics.
http://www.markleith.co.uk/2012/07/04/mysql-performance-schema-statement-digests/

Logging mysql queries

I am about to begin developing a logging system for future implementation in a current PHP application to get load and usage statistics from a MYSQL database.
The statistic will later on be used to get info about database calls per second, query times etc.
Of course, this will only be used when the app is in testing stage, since It will most certainly cause a bit of additional load itself.
However, my biggest questionmark right now is if i should use MYSQL to log the queries, or go for a file-based system. I'll guess that it would be a bit of a headache to create something that would allow writings from multiple locations when using a file based system to handle the logs?
How would you do it?

Use the general log, which will show client activity, including all the queries:
http://dev.mysql.com/doc/refman/5.1/en/query-log.html
If you need very detailed statistics on how long each query is taking, use the slow log with a long_query_time of 0 (or some other sufficiently short time):
http://dev.mysql.com/doc/refman/5.1/en/slow-query-log.html
Then use http://www.maatkit.org/ to analyze the logs as needed.

MySQL already had logging built in- Chapter 5.2 of the manual describes these. You'll probably be interested in The General Query Log (all queries), the Binary Query Log (queries that change data) and the Slow log (queries that take too long, or don't use indexes).
If you insist on using your own solution, you will want to write a database middle layer that all your DB calls go through, which can handle the timing aspects. As to where you write them, if you're in devel, it doesn't matter too much, but the idea of using a second db isn't bad. You don't need to use an entirely separate DB, just as far as using a different instance of MySQL (on a different machine, or just a different instance using a different port). I'd go for using a second MySQL instance instead of the filesystem- you'll get all your good SQL functions like SUM and AVG to parse your data.

If all you are interested in is longer-term, non-real time analysis, turn on MySQL's regular query logging. There are tons of tools for doing analysis on the query-logs (both regular and slow-query), giving you information about the run-times, average rows returned, etc. Seems to be what you are looking for.

If you are doing tests on MySQL you should store the results in a different database such as Postgres, this way you won't increase the load with your operations.

I agree with macabail but would only add that you could couple this with a cron job and a simple script to extract and generate any statistics you might want.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008