quartz clustered scheduler deadlock - mysql

We randomly get below exception while running quartz clustered scheduler on 6 instances:
Couldn't acquire next trigger: Deadlock found when trying to get lock;
try restarting transaction
Here is our quartzConfig.properties
scheduler.skipUpdateCheck = true
scheduler.instanceName = 'quartzScheduler'
scheduler.instanceId = 'AUTO'
threadPool.threadCount = 13
threadPool.threadPriority = 5
jobStore.misfireThreshold = 300000
jobStore.'class' = 'org.quartz.impl.jdbcjobstore.JobStoreTX'
jobStore.driverDelegateClass = 'org.quartz.impl.jdbcjobstore.StdJDBCDelegate'
jobStore.useProperties = true
jobStore.dataSource = 'myDS'
jobStore.tablePrefix = 'QRTZ_'
jobStore.isClustered = true
jobStore.clusterCheckinInterval = 10000
dataSource.myDS.driver='com.mysql.jdbc.Driver'
dataSource.myDS.maxConnections = 15
We are using quartz grails plugins(with quartz 2.2.1) in our application with mysql db.

Related

Zabbix - Utilization of housekeeper processes over 75%

I'm monitoring about 1300 hosts on Zabbix. After I defined bulk templates for all hosts, I got the "Utilization of housekeeper processes over 75%" alarm, but it was not resolved for about 20 hours. I did not define a housekeeper in my server settings. How can I resolve this alarm and what is the effect? Using postgresql.
Server config;
StartPollers=50
StartPollersUnreachable=50
StartPingers=50
StartDiscoverers=50
StartHTTPPollers=50
CacheSize=1024M
HistoryCacheSize=1024M
TrendCacheSize=1024M
ValueCacheSize=1024M
LogSlowQueries=3000
MaxHousekeeperDelete=5000
Postgreqsql config;
max_connections = 1000
shared_buffers = 8GB
effective_cache_size = 24GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 1048kB
min_wal_size = 1GB
max_wal_size = 4GB
max_worker_processes = 12
max_parallel_workers_per_gather = 4
max_parallel_workers = 12
max_parallel_maintenance_workers = 4​

How to improve the launch of gitlab-runner?

I installed on my vps gitlab runner, but every time I use in gitlab-ci.yml
tags:
- vps
I have a hold of 5 minutes (minimum) each time.
I installed gitlab runner with
apt-get install gitlab-runner
whereas if I use the gitlab-=runner from gitlab.com I don't get the hold
gitlab-runner info:
gitlab-runner status Runtime platform arch=amd64 os=linux pid=662441 revision=133d7e76 version=15.6.1
concurrent = 1
check_interval = 0
shutdown_timeout = 0
[session_server]
session_timeout = 1800
[[runners]]
runner 1:
[[runners]]
name = "name1"
url = "https://gitlab.com/"
id = 123
token = "ABC"
token_obtained_at = 2022-11-26T20:24:16Z
token_expires_at = 0001-01-01T00:00:00Z
executor = "shell"
[runners.custom_build_dir]
[runners.cache]
MaxUploadedArchiveSize = 0
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
runner2:
[[runners]]
name = "runner2"
url = "https://gitlab.com/"
id = 456
token = "ABC"
token_obtained_at = 2022-11-26T20:34:45Z
token_expires_at = 0001-01-01T00:00:00Z
executor = "docker"
[runners.custom_build_dir]
[runners.cache]
MaxUploadedArchiveSize = 0
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
tls_verify = false
image = "ruby:latest"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
thank you

Confluent kafka Python client Avro producer.producer() executes without error but no data in topic

My producer isnt throwing any errors but data is not being sent to the destination topic. Can you recommend any techniques to debug this situation.
I have call to a Confluent Python Avro Producer inside a synchronous loop to send data to a topic like so:
self.producer.produce(topic=test2, value=msg_dict)
After this call I have a piece of code like so to flush the queue:
num_messages_in_queue = self.producer.flush(timeout = 2.0)
print(f"flushed {num_messages_in_queue} messages from producer queue in iteration {num_iterations} ")
this executes without any error. But also there is no callback fired after this code executes. My producer is initiated as follows:
def __init__(self,broker_url=None,topic=None,schema_registry_url=None,schema_path=None):
try:
with open(schema_path, 'r') as content_file:
schema = avro.loads(content_file.read())
except Exception as e:
print(f"Error when trying to read avro schema file : {schema_path}")
self.conf = {
'bootstrap.servers': broker_url,
'on_delivery': self.delivery_report,
'schema.registry.url': schema_registry_url,
'acks': -1, #This guarantees that the record will not be lost as long as at least one in-sync replica remains alive.
'enable.idempotence': False, #
"error_cb":self.error_cb
}
self.topic = topic
self.schema_path = schema_path
self.producer = AvroProducer(self.conf,default_key_schema=schema, default_value_schema=schema)
My callback method is as follows:
def delivery_report(self, err, msg):
print(f"began delivery_report")
if err is None:
print(f"delivery_report --> Delivered msg.value = {msg.value()} to topic= {msg.topic()} offset = {msg.offset} without err.")
else:
print(f"conf_worker AvroProducer failed to deliver message {msg.value()} to topic {self.topic}. got error= {err}")
After this code is executed, I look at my topic on the schema registry container like so:
docker exec schema_registry_container kafka-avro-console-consumer --bootstrap-server kafka:29092 --topic test2 --from-beginning
I see this output:
[2020-04-03 15:48:38,064] INFO Registered kafka:type=kafka.Log4jController MBean
(kafka.utils.Log4jControllerRegistration$)
[2020-04-03 15:48:38,742]
INFO ConsumerConfig values:
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
bootstrap.servers = [kafka:29092]
check.crcs = true
client.dns.lookup = default
client.id =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = console-consumer-49056
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
isolation.level = read_uncommitted
key.deserializer = class >> org.apache.kafka.common.serialization.ByteArrayDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 500
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
session.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class >>org.apache.kafka.common.serialization.ByteArrayDeserializer
(org.apache.kafka.clients.consumer.ConsumerConfig)
[2020-04-03 15:48:38,887] INFO Kafka version : 2.1.0-cp1 (org.apache.kafka.common.utils.AppInfoParser)
[2020-04-03 15:48:38,887] INFO Kafka commitId : bda8715f42a1a3db (org.apache.kafka.common.utils.AppInfoParser)
[2020-04-03 15:48:39,221] INFO Cluster ID: KHKziPBvRKiozobbwvP1Fw (org.apache.kafka.clients.Metadata)
[2020-04-03 15:48:39,224] INFO [Consumer clientId=consumer-1, groupId=console-consumer-49056] Discovered group coordinator kafka:29092 (id: 2147483646 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2020-04-03 15:48:39,231] INFO [Consumer clientId=consumer-1, groupId=console-consumer-49056] Revoking previously assigned partitions []
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2020-04-03 15:48:39,231] INFO [Consumer clientId=consumer-1, groupId=console-consumer-49056] (Re-)joining group >(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2020-04-03 15:48:42,264] INFO [Consumer clientId=consumer-1, groupId=console-consumer-49056] Successfully joined group with generation 1
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2020-04-03 15:48:42,267] INFO [Consumer clientId=consumer-1, groupId=console-consumer-49056] Setting newly assigned partitions [test2-0] >(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2020-04-03 15:48:42,293] INFO [Consumer clientId=consumer-1, groupId=console-consumer-49056] Resetting offset for partition test2-0 to offset 0. >(org.apache.kafka.clients.consumer.internals.Fetcher)
So the answer is so trivial that its embarassing!
But it does point to the fact that in a multilayered infrastructure, a single value incorrectly set, can result in a silent failure which can be very tedious to track down.
So the issue came from incorrect param setting my in my docker-compose.yml file, where the env variable for broker_url was not set.
The application code needed this variable to reference the kafka broker.
However there was no exception thrown for this missing param and it was silently failing.

pause a single session but keep processing other

I want to the mysql-proxy lua script to handle interleaving accesses to a website (e.g. two different browser windows/users) but being able to pause/delay one of the two without influencing the other.Handling sessions interleavingly is possible in mysql-proyx lua (so it seems regarding the later listed output) but as soon as I start delaying the script it blocks everything and the other session cannot advance either.
-- the query which indicates the session/connection that shall be delayed at that execution
local qoi = "SELECT loginattempts,uid FROM mybb_users WHERE username='user1' LIMIT 1"
function read_query(packet)
if string.byte(packet) == proxy.COM_QUERY then
query = packet:sub(2)
start_time = os.time()
if query == qoi then
print("busy wait")
while os.time() < start_time + 20 do
--nothing
end
print("busy wait end")
end
print("Connection id: " .. proxy.connection.server.thread_id)
end
end
However this script ends up with output:
Connection id: 36
busy wait
busy wait end
Connection id: 36
Connection id: 36
Connection id: 36
Connection id: 37
Connection id: 37
Connection id: 36
Connection id: 36
Connection id: 36
Connection id: 37
and not the expected
Connection id: 36
busy wait
connection id: 37
connection id: 37
busy wait end
Connection id: 36
Is my intention even achievable and if so how?
It seems to be impossible to delay the session within lua but it works just as fine if I outsource the delay to mysql server as this will force the interleaving as well.
local DEBUG = true
local qoi = "SELECT loginattempts,uid FROM mybb_users WHERE username='user1' LIMIT 1"
function read_query(packet)
ret = nil
comp_query = qoi
if string.byte(packet) == proxy.COM_QUERY then
query = packet:sub(2)
if query == comp_query then
if DEBUG then
print("found matching query " .. packet:sub(2))
print("insert sleep")
end
inj_query = "SELECT sleep(30);"
new_packet = string.char(proxy.COM_QUERY) .. inj_query
proxy.queries:append(1, new_packet, { resultset_is_needed = true })
proxy.queries:append(2, packet, { resultset_is_needed = true })
ret = proxy.PROXY_SEND_QUERY
end
end
return ret
end
function read_query_result(inj)
if inj.id == 1 then
if DEBUG then
print("sleep query returns")
end
return proxy.PROXY_IGNORE_RESULT
end
if inj.id == 2 then
if DEBUG then
print("regular query returns")
end
return
end
return
end

django and celery beat scheduler no database entries

my problem is that the beat scheduler doesn't store entries in the table 'tasks' and 'workers'. i use django and celery. in my database (MySQL) i have added a periodic tast "Estimate Region" with Interval 120 seconds.
this is how i start my worker:
`python manage.py celery worker -n worker.node1 -B --loglevel=info &`
after i started the worker i can see in the terminal that the worker works and the scheduler picks out the periodic task from the database and operates it.
how my task is defined:
#celery.task(name='fv.tasks.estimateRegion',
ignore_result=True,
max_retries=3)
def estimateRegion(region):
terminal shows this:
WARNING ModelEntry: Estimate Region fv.tasks.estimateRegion(*['ASIA'], **{}) {<freq: 2.00 minutes>}
[2013-05-23 10:48:19,166: WARNING/MainProcess] <ModelEntry: Estimate Region fv.tasks.estimateRegion(*['ASIA'], **{}) {<freq: 2.00 minutes>}>
INFO Calculating estimators for exchange:Bombay Stock Exchange
the task "estimate region" returns me a results.csv file, so i can see that the worker and the beat scheduler works. But after that i have no database entries in "tasks" or "workers" in my django admin panel.
Here are my celery settings in settings.py
` CELERY_DISABLE_RATE_LIMITS = True
CELERY_TASK_SERIALIZER = 'pickle'
CELERY_RESULT_SERIALIZER = 'pickle'
CELERY_IMPORTS = ('fv.tasks')
CELERY_RESULT_PERSISTENT = True
# amqp settings
BROKER_URL = 'amqp://fv:password#localhost'
#BROKER_URL = 'amqp://fv:password#192.168.99.31'
CELERY_RESULT_BACKEND = 'amqp'
CELERY_TASK_RESULT_EXPIRES = 18000
CELERY_ROUTES = (fv.routers.TaskRouter(), )
_estimatorExchange = Exchange('estimator')
CELERY_QUEUES = (
Queue('celery', Exchange('celery'), routing_key='celery'),
Queue('estimator', _estimatorExchange, routing_key='estimator'),
)
# beat scheduler settings
CELERYBEAT_SCHEDULER = "djcelery.schedulers.DatabaseScheduler"
# development settings
CELERY_RESULT_PERSISTENT = False
CELERY_DEFAULT_DELIVERY_MODE = 'transient'`
i hope anyone can help me :)
Have you started celerycam?
python manage.py celerycam
It will take a snapshot (every 1 second by default) of the current state of tasks.
You can read more about it in the celery documentation