Logging mysql queries - mysql

I am about to begin developing a logging system for future implementation in a current PHP application to get load and usage statistics from a MYSQL database.
The statistic will later on be used to get info about database calls per second, query times etc.
Of course, this will only be used when the app is in testing stage, since It will most certainly cause a bit of additional load itself.
However, my biggest questionmark right now is if i should use MYSQL to log the queries, or go for a file-based system. I'll guess that it would be a bit of a headache to create something that would allow writings from multiple locations when using a file based system to handle the logs?
How would you do it?

Use the general log, which will show client activity, including all the queries:
http://dev.mysql.com/doc/refman/5.1/en/query-log.html
If you need very detailed statistics on how long each query is taking, use the slow log with a long_query_time of 0 (or some other sufficiently short time):
http://dev.mysql.com/doc/refman/5.1/en/slow-query-log.html
Then use http://www.maatkit.org/ to analyze the logs as needed.

MySQL already had logging built in- Chapter 5.2 of the manual describes these. You'll probably be interested in The General Query Log (all queries), the Binary Query Log (queries that change data) and the Slow log (queries that take too long, or don't use indexes).
If you insist on using your own solution, you will want to write a database middle layer that all your DB calls go through, which can handle the timing aspects. As to where you write them, if you're in devel, it doesn't matter too much, but the idea of using a second db isn't bad. You don't need to use an entirely separate DB, just as far as using a different instance of MySQL (on a different machine, or just a different instance using a different port). I'd go for using a second MySQL instance instead of the filesystem- you'll get all your good SQL functions like SUM and AVG to parse your data.

If all you are interested in is longer-term, non-real time analysis, turn on MySQL's regular query logging. There are tons of tools for doing analysis on the query-logs (both regular and slow-query), giving you information about the run-times, average rows returned, etc. Seems to be what you are looking for.

If you are doing tests on MySQL you should store the results in a different database such as Postgres, this way you won't increase the load with your operations.

I agree with macabail but would only add that you could couple this with a cron job and a simple script to extract and generate any statistics you might want.

Related

Kill Long Running Processes in MySQL

Scenario - you have hundreds of reports running on a slave machine. These reports are either scheduled by MySQL's event scheduler or are called via a Python/R or Shell script. Apart from that, there are fifty odd users who are connecting to MySQL slave running random queries. These people don't really know how to write good queries and that's fair. They are not supposed to. So, every now and then (read every day), you see some queries which are stuck because of read/write locks. How do you fix that.
What you do is that you don't kill whatever is being written. Instead, you kill all the read queries. Now, that is also tricky because, if you kill all the read queries, you will also let go off OUTFILE queries, which are actually write queries (they just don't write to MySQL, but write to disk).
Why killing is necessary (I'm only speaking for MySQL, do not take this out of context)
I have got two words for you - Slave lag. We don't want that to happen, because if that happens, all users, reports, consumers suffer.
I have written the following to kill processes in MySQL based on three questions
how long has the query been running?
who is running the query?
do you want to kill write/modify queries too?
What I have intentionally not done yet is that I have not maintained a history of the processes that have been killed. One should do that so as to analyse and find out who is running all the bad queries. But there are other ways to find that out.
I have create a procedure for this. Haven't spend much time on this. So, please suggest if this is a good way to do it or not.
GitHub Gist
Switch to MariaDB. Versions 10.0 and 10.1 implement several limits and timeouts: https://mariadb.com/kb/en/library/query-limits-and-timeouts/
Then write an API between what the users write and actually hitting the database. In this layer, add the appropriate limitations.

History of queries in MySql

Is there any way to check the query that occurs in my MySql database?
For example:
I have an application (OTRS) that allows you to generate reports according to the frames that I desire. I would like to know which query is made by the application in the database.
Because I will use it to integrate with other reporting software.
Is this possible?
Yes, you can enable logging in your MySQL server. there are several types of logs you can use, depending on what you want to log, starting from errors only or slow queries, and to logs that write everything done on your server.
See the full doc here
Although, as Nir says, mysql can log all queries (you should be looking at the general log or the slow log configured with a threshold of 0 seconds) this will show all the queries being run; on a production system it may prove difficult to match what you are doing in your browser with specific entries in the log.
The reason I suggest using the slow query log is that there are tools available which will remove the parameters from the queries, allowing you to see what SQL code is running more frequently.
If you have some proficiency in Perl it should be straightforward to output - all queries are processed via an abstraction layer.
(Presumably you are aware that the schema is published)

MySQL and Hibernate Simultaneous read write

I have a web application which has the following parts:
Commentators continuously doing match commentary through a browser based tool. The comments are inserted into DB using hibernat.
Lots of users are accessing a URL to read commentary. Hibernate is reading data from the table being updated by commentators in step #1.
There are some stored procedures as well which are set to run every 1 hour. Few of them access the same table (used in step #1 and #2) for reading and writing/updating purpose.
Now my problem is, whenever the site has 100+ concurrent users watching a particular match commentary, my MySQL goes down. It shows lots of queries stuck in processlist. Many of them are in "Copying to temp table" state. This makes the JBOSS restart frequently.
I am using transactions in hibernate for both reading and writing purposes. Please help because I loose big matches because of these crashes.
You have a performance problem. It is difficult to give solutions which always work. What you can consider to do is:
1) Revise the HQL (Hibernate) statements. For this best you write a protocol with <property name="show_sql">true</property> in the config file (or even a tool like log4jdbc if you want to see the actual parameters) and analyse the output. There you see which SQL requests you have most. In many cases a better strategy for reading and writing db data can significantly reduce the database traffic. And check you have good indexes for your table.
2) Consider to use a second level cache. (Normally hibernate only uses the first level cache, which is of no use in your case because it is bound to one session.) Then at least the requests for reading actual commentaries can be served by the cache and don't need to go to the database. (Pay attention: The cache might interfere with the stored procedures. Have a look if the cache product you like to use supports MySQL stored procedures. In the worst case you have to remove the stored procedures for the critical tables and let you application server do the job so it goes through the cache.)
3) If it is only a few tables which are heavily used you can consider to cache them by your application. That's more work, but perhaps you can do it exactly for the demands of your application, so you might be faster than with a general second level cache.
4) If nothing helps and the traffic is really too heavy then perhaps you have to invest in more hardware.
Good luck ;-)

Reduce database writes with memached

I would like to convert my stats tracking system not to write to the database directly, as we're hitting bottlenecks.
We're currently using memcached for certain aspects of the site, and I wanted to use it for storing stats and committing them to mysql DB periodically.
The issue lies however in the number of items (which is in the millions) for which potentially there could be stats collected between the cronjob runs that would commit them into the database. Other than running a SELECT * FROM data and checking for existence of every single memcache key, and then updating the table.... is there any other way to do this?
(I'm not saying below is gospel, this is just my gut feeling. As said later on, I don't have the specifics of your system :) And obviously no offence meant etc :) )
I would advice against using memcached for this. Memcached is build te quickly retrieve values that you've gotten before, not to store values. The big difference is that is your cache is getting full, you'll loose your data.
Normally, you'd just have no data in your cache, and recollect the data from the source, which is impossible in this case. That alone would be a reason for me to try an dissuade you from this.
Now you say the major problem is the mysql connection limit you are hitting. If you do simple stuff (like what we talked about in the comments: the insert delayed), it's just a case of increasing the limit. You should probably have enough power to have your scripts/users go to the database once and say "this should eventually be added", and then go away. If your users can't even open 1 connection for that, there's a serious resource problem you probably won't fix by adding extra layers of cache?
Obviously hard to say without any specs of the system, soft and hardware, but my suggestion would be to see if you can just let them open their connections by increasing the limit, and fiddle with the server variables a bit, instead of monkey-patching your system by using a memcached as an in-between layer.
I had a similar issue with statistic data. But please don't use memcached for it. You can't be sure that ALL your items will moved to DB. You can loose data and/or double process data.
You should analyse your bottleneck against how much data you are writing/reading and how many connections you need. And than switch to something scalable like Hadoop, Cassandra, Scripe and other systems.
You need to provide additional information on the platform that you are running: O/S, database (version), storage engine, RAM, CPU (if possible)?
Are you inserting into a single table or more than one table?
Can you disable the indexes on the tables you are inserting into as this slows down the insert functions.
Are you running any triggers or stored procedures to compute values as you insert the raw data?

SQL query optimization and debugging

the question is about the best practice.
How to perform a reliable SQL query test?
That is the question is about optimization of DB structure and SQL query itself not the system and DB performance, buffers, caches.
When you have a complicated query with a lot of joins etc, one day you need to understand how to optimize it and you come to EXPLAIN command (mysql::explain, postresql::explain) to study the execution plan.
After tuning the DB structure you execute the query to see any performance changes but here you're on the pan of multiple level of optimization/buffering/caching. How to avoid this? I need the pure time for the query execution and be sure it is not affected.
If you know different practise for different servers please specify explicitly: mysql, postgresql, mssql etc.
Thank you.
For Microsoft SQL Server you can use DBCC FREEPROCCACHE (to drop compiled query plans) and DBCC DROPCLEANBUFFERS (to purge the data cache) to ensure that you are starting from a completely uncached state. Then you can profile both uncached and cached performance, and determine your performance accurately in both cases.
Even so, a lot of the time you'll get different results at different times depending on how complex your query is and what else is happening on the server. It's usually wise to test performance multiple times in different operating scenarios to be sure you understand what the full performance profile of the query is.
I'm sure many of these general principles apply to other database platforms as well.
In the PostgreSQL world you need to flush the database cache as well as the OS cache as PostgreSQL leverages the OS caching system.
See this link for some discussions.
http://archives.postgresql.org/pgsql-performance/2010-08/msg00295.php
Why do you need pure execution time? It depends on so many factors and almost meaningless on live server. I would recommend to collect some statistic from live server and analyze queries execution time using pgfouine tool (it's for postgresql) and make decisions based on it. You will see exactly what do you need to tune and how effective was your changes on a report.