How can I log "show processlist" when there are more than n queries? - mysql

Our mysql processes can sometimes get backlogged and processes begin queuing up. I'd like to debug when and why this occurs by logging the processlist during slow times.
I'd like to run show full processlist; via a cron job and save output to text file if there are more than 50 rows returned.
Can you point me in the right direction?
For example:
echo "show full processlist;" | mysql -uroot > processlist-`date +%F-%H-%M`.log
I'd like to run that only when the result contains the text 50 rows in set (or greater than 50 rows).

pt-stalk is designed for this exact purpose. It samples the process list every second (or whatever time you specify), then when a threshold is reached (Threads_running is the default and is what you want in this case), collects a whole bunch of data, including disk activity, tcpdumps, multiple samples of the process list, server status variables, mutex/innodb status, and a bunch more.
Here's how to start it:
pt-stalk --daemonize --dest /var/lib/pt-stalk --collect-tcpdump --threshold 50 --cycles 1 --disk-pct-free 20 --retention-time 3 -- --defaults-file=/etc/percona-toolkit/pt-stalk_my.cnf
The command above will sample Threads_running (--threshold; set this to your value for n), every second (default of --interval) and fire a data collection if Threads_running is greater than 50 for 1 consecutive sample (--cycles). 3 days (--retention-time) of samples will be kept and collect will not fire if less than 20% of your disk is free (--disk-pct-free). At each collection, a pcap format tcpdump will be executed (--collect-tcpdump) which can be analyzed with either conventional tcpdump tools, or a number of other Percona Toolkit tools, including pt-query-digest and pt-tcp-model. There will be a 5 minute rest in between samples (default of --sleep) in order to prevent from DoS'ing yourself. The process wil be daemonized (--daemonize). The parameters after -- will be passed to all mysql/mysqladmin commands, so is a good place to set things like --defaults-file where you can store your login credentials away from prying eyes.

First of all, make sure MySQL's slow queries log isn't what you need. Also, MySQL's -e parameter allows you to specify a query on the command line.
Turning the logic around, this saves the process list and removes it when the process list isn't long enough:
date=$(date +...) # set the desired date format here
[ $(mysql -uroot -e "show full processlist" | tee plist-$date.log | wc -l) -lt 51 ] && rm plist-$date.log

Related

Bash - Faster way to check for file changes than md5?

I've got a MySQL DB set up on my system for local testing, and I'm monitoring the tables to see when a change is made.
Step 1 - Go to DIR
cd /usr/local/mysql-5.7.16-osx10.11-x86_64/data/blog_atom_tables/
Step 2 - Run Script
watchDB
Where watchDB() is (slightly modified for readability)...
function watchDB() {
declare -A aa // Associative array of filenames and their md5 hashes
declare k // Holder for current md5
prt="0"
while true; do // Run forever
// Loop through all table files within directory
for i in *.ibd;
do
k=$(sudo md5 -q $i) // md5 of file (table)
// If table has not been hashed yet
if [[ ${aa[$(echo $i | cut -f 1 -d '.')]} == "" ]]; then
aa[$(echo $i | cut -f 1 -d '.')]=$k
// If table has been hashed, and diff md5 (i.e. table changed)
elif [[ ${aa[$(echo $i | cut -f 1 -d '.')]} != $k ]]; then
echo $i;
aa[$(echo $i | cut -f 1 -d '.')]=$k
fi
done
done
}
TL;DR Loop through all the table files within the directory, save a copy of each md5, and continue looping through checking for a change.
I don't need to see what rows/columns have been changed, only that the table itself is different. For the most part, this works exactly as I want, but calculating the md5 for every table takes a noticeable amount of time. For only 25 tables, it takes between 3 and 5 seconds to execute each loop.
Is there a quicker way to do this, other than md5? I'd use something like cmp, but I need to save a reference of the current state of the file, so I have something to compare it against.
This is only about 1/6 of the total tables that will eventually be in there, so any improvement on speed is welcome.
While it's not really checking the content of the file, you could use file system attributes as a simple way to monitor for changes. Unless the filesystem is mounted with the timestamps disabled, you can monitor the access time and modification time timestamps:
stat -f "%m" <filename>
The filesystem driver knows when reads and writes occur and subsequently updates the timestamps.

How can I invoke a shell or Perl script from iptables?

We're using CentOS and would like to ban several Asian countries from accessing the entire server. Almost every IP we check which has tried to hack into our server is allocated to an Asian country (Russia, China, Pakistan, etc.)
We have an IP to country MySQL database we can efficiently query and would like to try something like:
-A INPUT -p tcp -m tcp --dport 80 -j /path/to/perlscript.pl
The script would need the IP passed in as an argument, then it would return either an ACCEPT or DROP target?
Thanks for the answers, here's my follow up.
Do you know if it is possible though? Having a rule point to a script which returns a target? (ACCPET/DROP)
Not entirely sure how ipset works, will have to experiment I guess, but it looks like it creates a single rule. How would it handle Russia for example, which has over 6000 ranges assigned to it? And we want to add probably 20 - 40 countries in total, so we could end up needing to add in excess of 100,000 ranges. Wouldn't the overhead of a single MySQL query be less taxing?
SELECT country FROM ip_countries WHERE $VAR{ip} >= range1 && $VAR{ip} <= range2
The database we use is freely available here : http://software77.net/geo-ip/
It represents IPs in the database by converting the IP to a number using this formula :
$VAR{numberedIP} = $octs[3] + ($octs[2] * 256) + ($octs[1] * 256 * 256) + ($octs[0] * 256 * 256 * 256);
It will store the start of the range in the "range1" column, and the end of the range in the "range2" column.
So you can see how we'd look up an IP using the above query. Literally takes less than a hundredth of a second to get a result and it's quite accurate. We have one website on a dedicated server, quite low traffic. But as with all servers I have ever checked, this one is hit daily by hackers' robots, checking email accounts, FTP accounts etc. And just about every web server I've ever worked on is compromised sooner or later. In our case, 99.99% of traffic from Asian countries has criminal intent attached to it.
We'd like this to run via iptables so that all ports are covered, not just HTTP for example by using directives in say .htaccess.
Do you think ipset would still be faster and more efficient?
It would be far too slow to launch perl for every matching packet. The right tool for this sort of thing is ipset, and there is much more information and documentation available on the ipset man page.
In CentOS you can install it with yum. Naturally, all of these commands and the script need to run as root:
# yum install ipset
Next install the kernel modules (you'll want this to happen at boot as well):
# modprobe -v ipset ip_set_hash_netport
And then use a script like the following to populate an ipset and block IP's from its ranges using iptables:
#!/usr/bin/env perl
use strict;
use warnings;
use DBI;
my $dbh = DBI->connect('... your DSN ...',...);
# I have no knowledge of your schema, but if you can pull the
# address range in the form: AA.BB.CC.DD/NN
my $ranges = $dbh->selectcol_arrayref(
q{SELECT cidr FROM your_table WHERE country_code IN ('CN',...)});
`ipset create geoblock hash:netport`;
for (#$ranges) {
# to match on port 80:
`ipset add geoblock $_,80`;
}
`iptables -I INPUT -m set --set geoblock src -j DROP`;
If you would like to block all ports rather than just 80, use the ip_set_hash_net module instead of ip_set_hash_netport, change hash:netport to hash:net, and remove ,80 from the ipset command.

keepalived + MySQL with periodic MISC_CHECK

I have Keepalived + MySQL (master - master) setup done.
I have kept the priority same for MASTER and BACKUP because I don't want them to start flapping frequently (one time switch of VIP is good enough).
This setup works fine if I use the simple 'vrrp-script' to check if mysql daemon is down. e.g.
script to check mysql daemon
vrrp_script chk_mysql {
script "killall -0 mysqld" # verify the pid is exist or not
interval 2 # check every 2 seconds
weight 2
}
I want to make it work with deeper health check with one python script. I want to use MISC_CHECK for that.
e.g.
MISC_CHECK {
misc_path “script_to_call_python_script.sh xxxx xxxx xxxx xxxx”
misc_timeout 5
}
My query is:
How can I make the MISC_CHECK to run at specified intervals?
Otherwise, what is 'required' output of script in 'vrrp_script', so that I could run
shell script there (which runs are periodic interval)?
Place the python code in a folder and in your vrrp_script call it like
vrrp_script chk_mysql {
script "location of you python script"
interval "the specified interval"
weight 2
}
Set the output to 0 or 1 depending on the check
as #nimesh said above, vrrp_script support python script directly. Just put your shell/python/rudy location with the script "location of you script" config.

Processing MySQL result in bash

I'm currently having a already a bash script with a few thousand lines which sends various queries MySQL to generate applicable output for munin.
Up until now the results were simply numbers which weren't a problem, but now I'm facing a challenge to work with a more complex query in the form of:
$ echo "SELECT id, name FROM type ORDER BY sort" | mysql test
id name
2 Name1
1 Name2
3 Name3
From this result I need to store the id and name (and their respective association) and based on the IDs need to perform further queries, e.g. SELECT COUNT(*) FROM somedata WHERE type = 2 and later output that result paired with the associated name column from the first result.
I'd know easily how to do it in PHP/Ruby , but I'd like to spare to fork another process especially since it's polled regularly, but I'm complete lost where to start with bash.
Maybe using bash is the wrong approach anyway and I should just fork out?
I'm using GNU bash, version 3.2.39(1)-release (i486-pc-linux-gnu).
My example is not Bash, but I'd like to point out my parameters at invoking the mysql command, they surpress the boxing and the headers.
#!/bin/sh
mysql dbname -B -N -s -e "SELECT * FROM tbl" | while read -r line
do
echo "$line" | cut -f1 # outputs col #1
echo "$line" | cut -f2 # outputs col #2
echo "$line" | cut -f3 # outputs col #3
done
You would use a while read loop to process the output of that command.
echo "SELECT id, name FROM type ORDER BY sort" | mysql test | while read -r line
do
# you could use an if statement to skip the header line
do_something "$line"
done
or store it in an array:
while read -r line
do
array+=("$line")
done < <(echo "SELECT id, name FROM type ORDER BY sort" | mysql test)
That's a general overview of the technique. If you have more specific questions post them separately or if they're very simple post them in a comment or as an edit to your original question.
You're going to "fork out," as you put it, to the mysql command line client program anyhow. So either way you're going to have process-creation overhead. With your approach of using a new invocation of mysql for each query you're also going to incur the cost of connecting to and authenticating to the mysqld server multiple times. That's expensive, but the expense may not matter if this app doesn't scale up.
Making it secure against sql injection is another matter. If you prompt a user for her name and she answers "sally;drop table type;" she's laughing and you're screwed.
You might be wise to use a language that's more expressive in the areas that are important for data-base access for some of your logic. Ruby, PHP, PERL are all good choices. PERL happens to be tuned and designed to run snappily under shell script control.

Mapnik ignoring my lat long bounding box

Can anyone see anything wrong with the following set of commands? Every time I run these image.png is a image of the UK and not the JOSM map I exported. I'm guessing there's something awry with the db import however the output mentions that it's processing my coords and data.
Steps:
1 - Exported a .osm file from JOSM or Merkaator.
2 - Imported into psql using the following command:
osm2pgsql -m -d gis -S ~/mapnik/default.style -b 103,1.3,104,1.4 ion.osm -v -c
The output for this looks like:
marshall#ubuntu:~/mapnik$ osm2pgsql -m -d gis -S ~/mapnik/default.style -b 103,1.3,104,1.4 ion.osm -v -c
osm2pgsql SVN version 0.66-
Using projection SRS 900913 (Spherical Mercator)
Applying Bounding box: 103.000000,1.300000 to 104.000000,1.400000
Setting up table: planet_osm_point
Setting up table: planet_osm_line
Setting up table: planet_osm_polygon
Setting up table: planet_osm_roads
Mid: Ram, scale=100
Reading in file: ion.osm
Processing: Node(25k) Way(3k) Relation(0k)
Node stats: total(25760), max(844548651)
Way stats: total(3783), max(69993379)
Relation stats: total(27), max(536780)
Writing way(3k)
Writing rel(0k)
Committing transaction for planet_osm_point
Sorting data and creating indexes for planet_osm_point
Committing transaction for planet_osm_line
Committing transaction for planet_osm_roads
Sorting data and creating indexes for planet_osm_line
Committing transaction for planet_osm_polygon
Sorting data and creating indexes for planet_osm_roads
Sorting data and creating indexes for planet_osm_polygon
Completed planet_osm_polygon
Completed planet_osm_roads
Completed planet_osm_point
Completed planet_osm_line
I can see the correct lat/lon coords being passed in, I'm not sure how to verify this within the database
3 - ./generate_xml.py --accept-none --dbname gis --symbols ./symbols/ --world_boundaries ../world_boundaries/
4 - ./generate_image.py
At this point image.png is a map of the UK, not Singapore which I have specified.
Can anyone see anything wrong with this? This is with mapnik 0.71 on ubuntu
Found the solution.
The issue is that the generate_image.py script does not read the data from the database but rather has it hardcoded inside. I'm not sure the reasoning behind this.
The solution is to edit generate_image.py manually and change the relevant line:
ll = (103,1.3,104,1.4)