MySQL online vs. batch processing - options for preventing MySQL cron jobs from blocking online queries? - mysql

Is there another way to prevent nightly cron jobs that do batch processing against mysql from impacting online webserver->mysql queries other than setting query priority? I'm thinking there may be a way to segment these, but I'm not sure if this is possible?

Try and break the queries down, perhaps rather than processing lots of data in one go try and process smaller batches but more often. This way you will lock tables for less time and allow gaps for queries from the frontend to be executed.
Another solution would be to process more often but even during the day. My last project used an event system, so that a user would comment something and this event would go into a queue. A background process (executed from The Fat Controller) would then take this event and insert data so that all the user's friends news feeds were updated about the comment. That way feeds are updated by simple insert statements and not rebuilt from scratch every x hours.

Related

How expensive are MySQL events?

In my web app I use two recurring events that "clean up" one of the tables in the database, both executed every 15 minutes or so.
My question is, could this lead to problems in performance in the future? Because I've read somewhere -I don't recall where exactly- that MySQL events are supposed to be scheduled to run once a month or so. Thing is, this same events keep the table in a pretty reduced size (as they delete records older than 15~ minutes), maybe this compensates the frequency of their execution, right?
Also, is it better to have one big MySQL event or many small ones if they are be called in the same frequency?
I don't think there's a performance indication in the monthly base just more of a suggestion of what to do with it. So i think you're ok with doing your cleanup using the events.
In the end the documentation suggets that the events are
Conceptually, this is similar to the idea of the Unix crontab (also known as a “cron job”) or the Windows Task Scheduler.
And the concept for those is that you can run a task every minute if you wish to do so.
On the second part of that question:
Serialize or spread it up. If you split them up into many events that will run at the same time you will create spikes of possibly very high cpu usage that might slow down the application while processing the events.
So either pack everything into one event so it runs in succession or spread the single events up so they execute on different times during the 15 minutes timeframe. Personally i think the first one is to be preferred, pack them up into a single event as then they are guaranteed to run in succession, even if a single one of them keeps running longer than usual.
The same goes for cronjobs. If you shedule 30 long-running exports at a single time your application is going to fail miserably during that timeslot (learned that the hard way).

is it possible to use cron too much?

I run a game statistics site. Its MySQL database is small potatoes compared to most of the things people work on around here, but shared hosting does necessitate an eye on query optimization, particularly when performing lots of joins and sub-queries.
Earlier this week I moved a rather slow (~0.5s) query that grouped, counted, averaged, and sorted the ratings of members to a nightly cron job. Results are stored in a table.
Because we average about one new rating per day, the change does not cause any perceptible data inaccuracy to my users, AND the new query which just grabs rows from the table runs in the ~0.000X range, so all pages pulling that data are noticeably faster.
Clearly this is a good thing!
And as I sat there basking in the glow of my cron job, my mind started running through other aspects of the site and mentally tagging those that could be cron'd... (many)
Which leads me to wonder - is it possible to use cron too much?
Because my site's database changes about once a day, I could conceivably run ALL complex queries (there are many) through nightly cron jobs and store the results in tables.
Is there ever a downside? (apart from data occasionally not being up-to-the-second accurate?)
Cron is great; it's usually a good thing to refrain from reinventing wheels. Some applications have more precise needs than cron can accommodate, so that's one reason not to use it. Also, distributing and managing cronjobs that are to form an integral part of your app can be difficult and error-prone, especially absent a competent package manager from the OS. Troubleshooting can be a little bit of a pain, particularly when there's one server missing one of its 100 cronjobs or something, but that can be managed with an OS package manager or with something like puppet.
But my opinion is to use cron whenever you can and makes sense, rather than rolling your own.
You're not beginning to approach the limits of what amount of jobs can (or should) be scheduled with cron. You'll be just fine. :)
You might want to consider a worker-message queue like gearman to trigger jobs that should be run 'after the fact', but not necessarily on a fixed schedule.
how about one cron job that runs all your procedures?
I once worked on a unix system that failed pretty miserably after the cron job queue exceeded 20 entries. The queue did not execute on any predictable cycle - i.e. FILO, FIFO LIFO etc. it simply was randomized
You might consider using triggers to keep your summary statistics up to date. There's also an event scheduler in MySQL 5.1+ if you like running queries periodically.
http://dev.mysql.com/doc/refman/5.0/en/triggers.html
http://dev.mysql.com/doc/refman/5.1/en/events.html

Most efficient method of logging data to MySQL

We have a service which sees several hundred simultaneous connections throughout the day, peeking at about 2000, for about 3 million hits a day, and growing. With each request I need to log 4 or 5 pieces of data to MySQL, we originally used the logging that came with the app were using however it was terribly inefficient and would run my db server at >3x the average cpu load, and would eventually bring the server to it knees.
At this point we are going to add our own logging to the application (php), the only option I have for logging data is the MySQL db, as this is the only common resource available to all of the http servers. This data will be mostly writes however everyday we generate reports based on the data, then crunch and archive the old data.
What recommendations can be made to ensure that I don't take down our services with logging data?
The solution we took with this problem was to create an archive table then regularly ( every 15 minutes, on an app server) crunch the data and put it back into the tables that were used to generate reports. The archive table of course did not have any indices, the tables which the reports are generated from have several indices.
Some stats on this approach:
Short Version: >360 times faster
Long Version:
The original code/model did direct inserts into the indexed table, and the average insert took .036 seconds, using the new code/model inserts took less than .0001 seconds (I was not able to get an accurate fix on the insert time I had to measure 100,000 inserts and average for the insert time). The post-processing (crunch) took an average 12 seconds for several tens-of-thousands records. Overall we were greatly pleased with this approach and so far it has worked incredibly well for us.
Based on what you describe, I recommend you try to leverage the fact that you don't need to read this data immediately and pursue a "periodic bulk commit route". That is, buffer the logging data in RAM on the app servers and doing periodic bulk commits. If you have multiple application nodes, some sort of randomized approach would help even more (e.g., commit updated info every 5 +/- 2 minutes).
The main drawback with this approach is that if an app server fails, you lose the buffered data. However, that's only bad if (a) you absolutely need all of the data and (b) your app servers crash regularly. Small chance that both are true, but in the event they are, you can simply persist your buffer to local disk (temporarily) on an app server if that's really a concern.
The main idea is:
buffering the data
periodic bulk commits (leveraging some sort of randomization in a distributed system would help)
Another approach is to stop opening and closing connections if possible (e.g., keep longer lived connections open). While that's likely a good first step, it may require a fair amount of work on your part on a part of the system that you may not have control over. But if you do, it's worth exploring.

Lazy deletion of table rows

is there any software that does "lazy" deletion of the rows from the table. I would like to do maintenance of my tables when my server is idle, and ideally i should be able to define what "idle" is (num of database connections/system load/ requests per second). Is there anything remotely similar to this?
If you are on a linux server, you can make your table cleanup scripts only run based on the output of the command "w" which will show you a system load. If your system load is under say .25 you can run your script. Do this with shell scripting.
To some degree, from an internal perspective InnoDB already does this. Rows are initially marked as deleted, but only made free as part of a background operation.
My advice: You can get in to needlessly complicated problems if you try and first check if the server is idle. i.e.
What if it was idle, but the cleanup takes 2 minutes. During that 2 minutes the server load peaks?
What if the server never becomes idle enough? Now you just have an unlimited backlog.
If you just background the task you might improve performance enough, since now at least no users will be sitting in front of web pages waiting for it to complete. Look at activity graphs as to what is the best time to schedule it (3am, 5am etc).

MySQL tracking system

I have to implement a tracking system backed up by a MySQL database. The system will track many apps with at least 5 events tracked for each app (e.g. how many users clicked on link x, how many users visited page y). Some apps will have millions of users so a few thousand updates/second is not a far fetched assumption.
Another component of the system will have to compute some statistical info that should be update every minute. The system should also record past values of those statistical values.
The approach a friend of mine suggested was to log every event in a log table and have a cron job that runs every minute and computes the desired info and updates a stats table.
This sounds reasonable to me. Are there better alternatives?
Thanks.
I've logged to a mysql log table with a cron that crunches it.
I generally use innodb tables in my apps, but for the log table I did it as myisam and used insert DELAYED . . . queries.
Myisam doesn't provide all the goodies of innodb, but I believe it is slightly faster (for that reason).
The main thing you are worried about is database locking when your cron is running, but using "insert delayed" gets around that problem for the most part.
if your hits rate it too high for even insert delated into myisam table to handle, you may want to keep recent hits in memory (memcache can come in handy, or a custom daemon you can write) and process the hits from memory periodically into the database stats table (aggregated).
I would really recommend you to use an already existing log analyzer analyzing the already existing logs from your web server. One example is webalizer. Even better in my opinion is an external system such as google analytics. This works better since it will keep working with intermediate systems such as load balancers and caches in place.