SQL server job steps have a setting for 'retry attempts'; if a job fails for some reason, depending on the setting, SQL server tries to restart the SQL job.
We have this setting (5 attempts) for our replication agents which run continuously.
I need to know, how many counts I have exhausted so far. For e.g. If the if a agent has already failed 4 times, I'd rather stop and start the agent which will reset the counter, than wait for it to fail during off hours.
Is there a way to get how many tries have been already done?
View the job history. Open the current execution. It should contain a column named Retries Attempted. You might have to scroll to the right a bit, it's one of the last columns.
Related
When restarting a failed transaction at commit stage I get a second failure when restarting the transaction. This is running Galera Cluster under MariaDB 10.2.6.
The sequence of events goes like this:
Commit a transaction (say a single insert).
COMMIT fails with error 1213 "Deadlock found when trying to get lock"
Begin a new transaction to replay the SQL statement[s].
BEGIN fails with error 1047 "WSREP has not yet prepared node for application use"
My application bails to avoid a more serious crash (see notes below)
This happens quite regularly and although the cluster recovers, individual threads receive failures. Yesterday this happened 15 times in one second.
I cannot identify any root cause for this. It seems that the deadlock is the initiator of the problem. The situation should be recoverable (and often is) But with multiple clients all trying to resolve their deadlocks at the same time, the whole thing seems to just fail.
Notes:
This is related to an earlier question where retrying failed transactions caused total crash of the cluster. I've managed to prevent crashes by retrying transactions only on deadlocks. i.e. if a different type of error occurs during a restart the application gives up.
I'm aware that 10.2.6 is not the latest version of MariaDB. I'm nervous to upgrade right now as I've had such bad experiences. I would like to understand the current problem before doing an upgrade and I've been unable to reproduce the errors in a test environment.
I'm not sure, but I suspect 3 tries (not 2) is appropriate. Committing involves two steps:
Checking for a Deadlock purely within the node you are connected to. (Eg: another query is touching the same row or gap.)
Checking with the other nodes to see if they will complain. (Eg: The same row has already been inserted into another node.)
Sure, either of those could happen repeatedly, and in any order. But making 3 tries seems reasonable.
Now, once you have failed "too many" times, it is right to abort and get a human (a DBA type) involved. I suspect that you could restructure your code / application logic / etc in some way to avoid most of the failures. Would you like to provide more details, so we can discuss that possibility...
What kind of table? (Queue, transactions, logging, etc)
SHOW CREATE TABLE. (auto_inc, unique keys, etc; too many UNIQUE keys can aggravate the situation)
What does the INSERT look like?
How often do you run inserts like this one? How often does it fail? (Instrument your code so you count even those that you can recover from.)
How spread out is the Cluster? (ping time)
What other queries are hitting the table? (They may be aggravating the issue.)
We have got 3 REST-Applications within a cluster.
So each application server can receive requests from "outside".
Now we got timed events, which are analysing the database and add/remove rows from the database, send emails, etc.
The problem is, that each application server does start this timed events and it happens that 2 application server are starting this analysing job at the same time.
We got a sql table in the back.
Our idea was to lock a table within the sql database, when starting the job. If the table is locked, we exit the job, because an other application just started to analyse.
What's a good practice to insert some kind of semaphore ?
Any ideas ?
Don't use semaphores, you are over complicating things, just use message queueing, where you queue your tasks and get them executed in row.
Make ONLY one separate node/process/child_process to consume from the queue and get your task done.
We (at a previous employer) used a database-based semaphore. Each of several (for redundancy and load sharing) servers had the same set of cron jobs. The first thing in each was a custom library call that did:
Connect to the database and check for (or insert) "I'm working on X".
If the flag was already set, then the cron job silently exited.
When finished, the flag was cleared.
The table included a timestamp and a host name -- for debugging and recovering from cron jobs that fail to finish gracefully.
I forget how the "test and set" was done. Possibly an optimistic INSERT, then check for "duplicate key".
I'm running PHP commandline scripts as rabbitmq consumers which need to connect to a MySQL database. Those scripts run as Symfony2 commands using Doctrine2 ORM, meaning opening and closing the database connection is handled behind the scenes.
The connection is normally closed automatically when the cli command exits - which is by definition not happening for a long time in a background consumer.
This is a problem when the consumer is idle (no incoming messages) longer then the wait_timeout setting in the MySQL server configuration. If no message is consumed longer than that period, the database server will close the connection and the next message will fail with a MySQL server has gone away exception.
I've thought about 2 solutions for the problem:
Open the connection before each message and close the connection manually after handling the message.
Implementing a ping message which runs a dummy SQL query like SELECT 1 FROM table each n minutes and call it using a cronjob.
The problem with the first approach is: If the traffic on that queue is high, there might be a significant overhead for the consumer in opening/closing connections. The second approach just sounds like an ugly hack to deal with the issue, but at least i can use a single connection during high load times.
Are there any better solutions for handling doctrine connections in background scripts?
Here is another Solution. Try to avoid long running Symfony 2 Workers. They will always cause problems due to their long execution time. The kernel isn't made for that.
The solution here is to build a proxy in front of the real Symfony command. So every message will trigger a fresh Symfony kernel. Sound's like a good solution for me.
http://blog.vandenbrand.org/2015/01/09/symfony2-and-rabbitmq-lessons-learned/
My approach is a little bit different. My workers only process one message, then die. I have supervisor configured to create a new worker every time. So, a worker will:
Ask for a new message.
If there are no messages, sleep for 20 seconds. If not, supervisor will think there's something wrong and stop creating the worker.
If there is a message, process it.
Maybe, if processing a message is super fast, sleep for the same reason than 2.
After processing the message, just finish.
This has worked very well using AWS SQS.
Comments are welcomed.
This is a big problem when running PHP-Scripts for too long. For me, the best solution is to restart the script some times. You can see how to do this in this Topic: How to restart PHP script every 1 hour?
You should also run multiple instances of your consumer. Add a counter to any one and terminate them after some runs. Now you need a tool to ensure a consistent amount of worker processes. Something like this: http://kamisama.me/2012/10/12/background-jobs-with-php-and-resque-part-4-managing-worker/
Before converting a project to use mysql, I have questions regarding the best way to avoid loss of a simple record update due to either a server crash or a program shutdown due to exceeding a/the cgi run-time limit.
My project is public and therefore applicable to any / many hosts where high level server side management isn't an option.
I wish to open a list file (or table) and acquire a list of records to parse one at a time.
While parsing each acquired list record, have the program / script perform a task with each record and update a counter (simple table) upon successful completion of each task (alternatively update each record with a success flag).
Do mysql tables get auto updated to the hard drive when "updated" or "added" to, thus, avoiding loss of all table changes to the point of crash if / when the program / script is violently terminated as described?
To have any chance with and do same with simple text files the counter has to be opened and closed for each update (as all content of open files on most O/S get clobbered when crashed).
Any description outline of mysql commands / processes etc to follow, if needed to avoid described losses, would also be very much appreciated.
Also, if any sugestions, are they applicable to both InnoDB and MyISM?
A simple answer comes to mind: SQL TRANSACTIONS. They're like a stack of SQL commands that 1. have to be "commited" 2. would come into action only if the last command is successfully executed.
I think this would help:
http://www.sqlteam.com/article/introduction-to-transactions
If my answer wasn't correct, pls, let me know if i misunderstood your intensions.
I have a problem with a Grails based application that is connected to MySQL where there is a process that updates a record as part of a larger transaction. This process also kicks off a 2nd thread via a Quartz job that will perform some additional changes. The Quartz job typically starts before the first thread commits the transaction therefore the job loops up to one minute checking for the record to change to the expected state. Oddly it works consistently in some environments, fails consistently in one and infrequently in yet another.
My question has to do with how MySQL recognizes transaction commits between two concurrent connections. One would expect that when connection A performs the commit, that subsequent queries from connection B would recognize the committed change. In my case connection B will have made the same query one or more times before connection A has made the commit. It appears that mySQL is caching the query results for the connection. Oddly enough, while connection B is repeatedly querying and getting the old value, I can issue the same query via the mysql client and see the new value. Does anyone aware of a caching issue or concurrency issues?
For the above observation I have the MySQL log enabled in order to see the individual update, commits and queries occurring.
The various environments are using different versions of MySQL as shown below. I'm in the process of upgrading my environments to the latest MySQL to see if that resolves it.
5.0.51a - two environments that have been very stable with infrequent occurrences however one environment started having increased occurrences over the weekend with moderate traffic.
5.1.55 - one environment consistently fails
Thanks,
John