I have an extensive slow query log which was running for a few weeks. I would like to parse it, which would put the highest occurring queries at the top (with number of executions and average times of execution), and it goes from there in descending order.
What tool/command can I use to accomplish that?
Check out Maatkit:
mk-query-digest - Parses logs and more. Analyze, transform, filter, review and report on queries.
For a better reading of the slow log I use:
mysqldumpslow -s c -t 10 /var/log/mysql_slow_queries.log
Edit the path to the slow log.
Sample output:
Reading mysql slow query log from /var/log/mysql_slow_queries.log
Count: 1 Time=21.52s (21s) Lock=0.00s (0s) Rows=3000.0 (3000), foo#localhost
SELECT * FROM students WHERE
(student_grade > N AND student_date < N)
ORDER BY RAND() ASC LIMIT N
Also, calling just the mysqldumpslow without any options, would print some useful settings like current query_cache_size, path to slow log, etc.
This script gives a more clear response than the maakit mk-query-digest:
https://github.com/LeeKemp/mysql-slow-query-log-parser/
This online analyzer worked really well for me, and it's free! Just drag and drop your slow query file.
https://www.slowquerylog.com/analyzer
You could use a spreadsheet if you can parse the data into individual fields, then sum by query and sort by count of executions and average. Y'know... Normal data analysis stuff.
If you need to, write a regex based application to split the log up into CSV. While you're at it, you could even write a concordance engine for the log names, check averages, and make the tool you're after.
Of course, someone is likely to come in and say "Duh, use 'analysisMySQLProDemon40005K+!'" after I post this.
Related
When I execute \s in MySQL console, I get a bunch of information. Also the information I don't care about. I just want to get slow queries and Queries per second avg.
I found solution for slow_queries because it is present in the status table of MySQL. However, Queries per second average isn't available in the MySQL status table.
Is there any way to get only Queries per second average? I can use grep to scrap the information I need in the SSH Console but I don't want to expose MySQL password in logs. This process is automated and not manual. So, the process has to be non-interactive.
I tried to find information in the performance_schema but it looks like I am missing something. Is this value calculated while execution of \s command?
I have a large slow query log file.
i want to check the queries of a particular date in last week.
Its very tough to scan the file and go the particular date.
Is there any command to extract the slow log queries of a particular date from that large file ?
Any linux command or something..
Thanks
pt-query-digest has --since and --until. Reference.
And the default output will display the 'worst' queries first.
(See Comments for further tips.)
As per my research I thought of using the mysqldumpslow utility to parse the log and extract the results, but not able to figure out how to use it. I want to get the count of number of queries logged in the slow query log for an interval of 10 minutes, so that the values can be compared for analysis.
Thanks
You could use logrotate to create a new slow.log every 10 minutes and analyze them one after another. Implying you are using Linux. Be aware that your example shows that your mysql instance is configured to "log-queries-not-using-indexes" hence you will also get those SELECT's that dont use an index in your log file too.
Update :
Since i still dont know what OS you are using, a more general aproach to your problem would be redirecting the slow log into mysql itself following the mysql docs and get all records from the slow log table like :
SELECT COUNT(*) FROM slow_log;
Which gives you the total amount of Querys logged. Follwed by a :
TRUNCATE TABLE slow_log;
Having a script in place doing this every 10 minutes would output the desired information.
I have a month worth of logfiles (~60gb uncompressed) and I need to run about 1000 thousand queries on these logfiles. Each logfile is ~68MB compressed with gzip.
For testing purpose I have installed Hadoop and Hive in pseudo-distributed mode on our test server (8core, 32gb ram) and I have loaded the logfiles in a hive table which looks somewhat like this:
date, time, userid, channel
And I have a file with about 1000 timeframes like this:
date, time-start, time-end
01_01_2015, 08:05:31, 08:09:54
01_01_2015, 08:54:10, 08:54:30
...
02_01_2015, 08:15:14, 08:20:48
...
[edit:] The timeframes on a single day are non-overlapping and with precision in seconds. They can be as short as 10 seconds and as long as several minutes.
I want to find out how many unique user were on my site during these exact timeframes.
With each of these timeframes being unique.
My question is what would be the most time efficient way of handling such a task? Running a thousand different queries in Hive seems like a terrible way of doing this.
The alternative would be to bundle say 50-100 queries into one to avoid too much overhead from creating jobs etc., would that work better? And is there a limit how long a query can be in Hive?
While Im interested in how this could be done with Hadoop, I'm also open for other suggestions (especially considering this runs in pseudo-distributed).
Are the timeframes overlapping? If so, would 1-minute chunks of the log be a reasonable way to chunk the data? That is would there be dozens or hundreds of rows per minute and all the timeframes have a resolution of one minute? If not one minute, maybe one hour?
Summarize the data in each 1-minute chunk; put the results in another database table. Then write queries against that table.
That would be the MySQL way to do it, probably in a single machine.
Edit (based on OP's edit showing that ranges are non-overlapping and not conveniently divided):
Given that the ranges are non-overlapping, you should aim for doing the work in a single pass.
I would pick between a Perl/PHP program that does all the work, versus 1000 sql calls with
INSERT INTO SummaryTable
SELECT MIN(ts), MAX(ts), SUM(...), COUNT(...)
FROM ...
WHERE ts BETWEEN...
(This assumes an index on ts.) That would be simple enough and fast enough -- It would run only slightly slower than the time it takes to read that much disk.
But... Why even put the raw data into a database table? That is a lot of work, with perhaps no long-term benefit. So, I am back to writing a Perl script to read the log file, doing the work as it goes.
I realized that using phpMyAdmin for testing the speed of queries might be dumb: it automatically applies a LIMIT clause.
I tried a certain query on a fairly large number of records (31,595) with a GROUP BY clause. phpMyAdmin, adding LIMIT 0, 200, took 0.1556 seconds to fetch the results.
I decided to try the same query from the command line without the LIMIT clause and it took 0.20 seconds. Great, so now I have the real time it takes for that query.
But the downside is I had to wait for 30,000+ records to print on the screen.
Is there a better solution?
EDIT:
To clarify, I am looking for a way to suppress the screen output of a select query while still getting an accurate time for running the query. And I want it to be something that could be typed in and timed at any time (i.e. I don't want to have to tweak slow log settings to capture results).
You could enclose your query in SELECT COUNT(1) to count the number of rows returned, without having all the rows printed out:
SELECT COUNT(1)
FROM (
<<you query goes here>>
) t;
I guess that what you really want is to obtain the best possible speed for your query, not really to time it.
If it's the case, type your query in phpMyAdmin (its adding a LIMIT clause is not important) then click on the "Explain SQL" link to see whether you are using indexes or full-table scans.
You could use console client mysql and time
$ time mysql -u user -h host -ppassword -e "show databases;" > /dev/null
real 0m0.036s
user 0m0.008s
sys 0m0.008s