We randomly get below exception while running quartz clustered scheduler on 6 instances:
Couldn't acquire next trigger: Deadlock found when trying to get lock;
try restarting transaction
Here is our quartzConfig.properties
scheduler.skipUpdateCheck = true
scheduler.instanceName = 'quartzScheduler'
scheduler.instanceId = 'AUTO'
threadPool.threadCount = 13
threadPool.threadPriority = 5
jobStore.misfireThreshold = 300000
jobStore.'class' = 'org.quartz.impl.jdbcjobstore.JobStoreTX'
jobStore.driverDelegateClass = 'org.quartz.impl.jdbcjobstore.StdJDBCDelegate'
jobStore.useProperties = true
jobStore.dataSource = 'myDS'
jobStore.tablePrefix = 'QRTZ_'
jobStore.isClustered = true
jobStore.clusterCheckinInterval = 10000
dataSource.myDS.driver='com.mysql.jdbc.Driver'
dataSource.myDS.maxConnections = 15
We are using quartz grails plugins(with quartz 2.2.1) in our application with mysql db.
Related
I'm monitoring about 1300 hosts on Zabbix. After I defined bulk templates for all hosts, I got the "Utilization of housekeeper processes over 75%" alarm, but it was not resolved for about 20 hours. I did not define a housekeeper in my server settings. How can I resolve this alarm and what is the effect? Using postgresql.
Server config;
StartPollers=50
StartPollersUnreachable=50
StartPingers=50
StartDiscoverers=50
StartHTTPPollers=50
CacheSize=1024M
HistoryCacheSize=1024M
TrendCacheSize=1024M
ValueCacheSize=1024M
LogSlowQueries=3000
MaxHousekeeperDelete=5000
Postgreqsql config;
max_connections = 1000
shared_buffers = 8GB
effective_cache_size = 24GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 1048kB
min_wal_size = 1GB
max_wal_size = 4GB
max_worker_processes = 12
max_parallel_workers_per_gather = 4
max_parallel_workers = 12
max_parallel_maintenance_workers = 4
I installed on my vps gitlab runner, but every time I use in gitlab-ci.yml
tags:
- vps
I have a hold of 5 minutes (minimum) each time.
I installed gitlab runner with
apt-get install gitlab-runner
whereas if I use the gitlab-=runner from gitlab.com I don't get the hold
gitlab-runner info:
gitlab-runner status Runtime platform arch=amd64 os=linux pid=662441 revision=133d7e76 version=15.6.1
concurrent = 1
check_interval = 0
shutdown_timeout = 0
[session_server]
session_timeout = 1800
[[runners]]
runner 1:
[[runners]]
name = "name1"
url = "https://gitlab.com/"
id = 123
token = "ABC"
token_obtained_at = 2022-11-26T20:24:16Z
token_expires_at = 0001-01-01T00:00:00Z
executor = "shell"
[runners.custom_build_dir]
[runners.cache]
MaxUploadedArchiveSize = 0
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
runner2:
[[runners]]
name = "runner2"
url = "https://gitlab.com/"
id = 456
token = "ABC"
token_obtained_at = 2022-11-26T20:34:45Z
token_expires_at = 0001-01-01T00:00:00Z
executor = "docker"
[runners.custom_build_dir]
[runners.cache]
MaxUploadedArchiveSize = 0
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
tls_verify = false
image = "ruby:latest"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
thank you
My producer isnt throwing any errors but data is not being sent to the destination topic. Can you recommend any techniques to debug this situation.
I have call to a Confluent Python Avro Producer inside a synchronous loop to send data to a topic like so:
self.producer.produce(topic=test2, value=msg_dict)
After this call I have a piece of code like so to flush the queue:
num_messages_in_queue = self.producer.flush(timeout = 2.0)
print(f"flushed {num_messages_in_queue} messages from producer queue in iteration {num_iterations} ")
this executes without any error. But also there is no callback fired after this code executes. My producer is initiated as follows:
def __init__(self,broker_url=None,topic=None,schema_registry_url=None,schema_path=None):
try:
with open(schema_path, 'r') as content_file:
schema = avro.loads(content_file.read())
except Exception as e:
print(f"Error when trying to read avro schema file : {schema_path}")
self.conf = {
'bootstrap.servers': broker_url,
'on_delivery': self.delivery_report,
'schema.registry.url': schema_registry_url,
'acks': -1, #This guarantees that the record will not be lost as long as at least one in-sync replica remains alive.
'enable.idempotence': False, #
"error_cb":self.error_cb
}
self.topic = topic
self.schema_path = schema_path
self.producer = AvroProducer(self.conf,default_key_schema=schema, default_value_schema=schema)
My callback method is as follows:
def delivery_report(self, err, msg):
print(f"began delivery_report")
if err is None:
print(f"delivery_report --> Delivered msg.value = {msg.value()} to topic= {msg.topic()} offset = {msg.offset} without err.")
else:
print(f"conf_worker AvroProducer failed to deliver message {msg.value()} to topic {self.topic}. got error= {err}")
After this code is executed, I look at my topic on the schema registry container like so:
docker exec schema_registry_container kafka-avro-console-consumer --bootstrap-server kafka:29092 --topic test2 --from-beginning
I see this output:
[2020-04-03 15:48:38,064] INFO Registered kafka:type=kafka.Log4jController MBean
(kafka.utils.Log4jControllerRegistration$)
[2020-04-03 15:48:38,742]
INFO ConsumerConfig values:
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
bootstrap.servers = [kafka:29092]
check.crcs = true
client.dns.lookup = default
client.id =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = console-consumer-49056
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
isolation.level = read_uncommitted
key.deserializer = class >> org.apache.kafka.common.serialization.ByteArrayDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 500
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
session.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class >>org.apache.kafka.common.serialization.ByteArrayDeserializer
(org.apache.kafka.clients.consumer.ConsumerConfig)
[2020-04-03 15:48:38,887] INFO Kafka version : 2.1.0-cp1 (org.apache.kafka.common.utils.AppInfoParser)
[2020-04-03 15:48:38,887] INFO Kafka commitId : bda8715f42a1a3db (org.apache.kafka.common.utils.AppInfoParser)
[2020-04-03 15:48:39,221] INFO Cluster ID: KHKziPBvRKiozobbwvP1Fw (org.apache.kafka.clients.Metadata)
[2020-04-03 15:48:39,224] INFO [Consumer clientId=consumer-1, groupId=console-consumer-49056] Discovered group coordinator kafka:29092 (id: 2147483646 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2020-04-03 15:48:39,231] INFO [Consumer clientId=consumer-1, groupId=console-consumer-49056] Revoking previously assigned partitions []
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2020-04-03 15:48:39,231] INFO [Consumer clientId=consumer-1, groupId=console-consumer-49056] (Re-)joining group >(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2020-04-03 15:48:42,264] INFO [Consumer clientId=consumer-1, groupId=console-consumer-49056] Successfully joined group with generation 1
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2020-04-03 15:48:42,267] INFO [Consumer clientId=consumer-1, groupId=console-consumer-49056] Setting newly assigned partitions [test2-0] >(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2020-04-03 15:48:42,293] INFO [Consumer clientId=consumer-1, groupId=console-consumer-49056] Resetting offset for partition test2-0 to offset 0. >(org.apache.kafka.clients.consumer.internals.Fetcher)
So the answer is so trivial that its embarassing!
But it does point to the fact that in a multilayered infrastructure, a single value incorrectly set, can result in a silent failure which can be very tedious to track down.
So the issue came from incorrect param setting my in my docker-compose.yml file, where the env variable for broker_url was not set.
The application code needed this variable to reference the kafka broker.
However there was no exception thrown for this missing param and it was silently failing.
I want to the mysql-proxy lua script to handle interleaving accesses to a website (e.g. two different browser windows/users) but being able to pause/delay one of the two without influencing the other.Handling sessions interleavingly is possible in mysql-proyx lua (so it seems regarding the later listed output) but as soon as I start delaying the script it blocks everything and the other session cannot advance either.
-- the query which indicates the session/connection that shall be delayed at that execution
local qoi = "SELECT loginattempts,uid FROM mybb_users WHERE username='user1' LIMIT 1"
function read_query(packet)
if string.byte(packet) == proxy.COM_QUERY then
query = packet:sub(2)
start_time = os.time()
if query == qoi then
print("busy wait")
while os.time() < start_time + 20 do
--nothing
end
print("busy wait end")
end
print("Connection id: " .. proxy.connection.server.thread_id)
end
end
However this script ends up with output:
Connection id: 36
busy wait
busy wait end
Connection id: 36
Connection id: 36
Connection id: 36
Connection id: 37
Connection id: 37
Connection id: 36
Connection id: 36
Connection id: 36
Connection id: 37
and not the expected
Connection id: 36
busy wait
connection id: 37
connection id: 37
busy wait end
Connection id: 36
Is my intention even achievable and if so how?
It seems to be impossible to delay the session within lua but it works just as fine if I outsource the delay to mysql server as this will force the interleaving as well.
local DEBUG = true
local qoi = "SELECT loginattempts,uid FROM mybb_users WHERE username='user1' LIMIT 1"
function read_query(packet)
ret = nil
comp_query = qoi
if string.byte(packet) == proxy.COM_QUERY then
query = packet:sub(2)
if query == comp_query then
if DEBUG then
print("found matching query " .. packet:sub(2))
print("insert sleep")
end
inj_query = "SELECT sleep(30);"
new_packet = string.char(proxy.COM_QUERY) .. inj_query
proxy.queries:append(1, new_packet, { resultset_is_needed = true })
proxy.queries:append(2, packet, { resultset_is_needed = true })
ret = proxy.PROXY_SEND_QUERY
end
end
return ret
end
function read_query_result(inj)
if inj.id == 1 then
if DEBUG then
print("sleep query returns")
end
return proxy.PROXY_IGNORE_RESULT
end
if inj.id == 2 then
if DEBUG then
print("regular query returns")
end
return
end
return
end
my problem is that the beat scheduler doesn't store entries in the table 'tasks' and 'workers'. i use django and celery. in my database (MySQL) i have added a periodic tast "Estimate Region" with Interval 120 seconds.
this is how i start my worker:
`python manage.py celery worker -n worker.node1 -B --loglevel=info &`
after i started the worker i can see in the terminal that the worker works and the scheduler picks out the periodic task from the database and operates it.
how my task is defined:
#celery.task(name='fv.tasks.estimateRegion',
ignore_result=True,
max_retries=3)
def estimateRegion(region):
terminal shows this:
WARNING ModelEntry: Estimate Region fv.tasks.estimateRegion(*['ASIA'], **{}) {<freq: 2.00 minutes>}
[2013-05-23 10:48:19,166: WARNING/MainProcess] <ModelEntry: Estimate Region fv.tasks.estimateRegion(*['ASIA'], **{}) {<freq: 2.00 minutes>}>
INFO Calculating estimators for exchange:Bombay Stock Exchange
the task "estimate region" returns me a results.csv file, so i can see that the worker and the beat scheduler works. But after that i have no database entries in "tasks" or "workers" in my django admin panel.
Here are my celery settings in settings.py
` CELERY_DISABLE_RATE_LIMITS = True
CELERY_TASK_SERIALIZER = 'pickle'
CELERY_RESULT_SERIALIZER = 'pickle'
CELERY_IMPORTS = ('fv.tasks')
CELERY_RESULT_PERSISTENT = True
# amqp settings
BROKER_URL = 'amqp://fv:password#localhost'
#BROKER_URL = 'amqp://fv:password#192.168.99.31'
CELERY_RESULT_BACKEND = 'amqp'
CELERY_TASK_RESULT_EXPIRES = 18000
CELERY_ROUTES = (fv.routers.TaskRouter(), )
_estimatorExchange = Exchange('estimator')
CELERY_QUEUES = (
Queue('celery', Exchange('celery'), routing_key='celery'),
Queue('estimator', _estimatorExchange, routing_key='estimator'),
)
# beat scheduler settings
CELERYBEAT_SCHEDULER = "djcelery.schedulers.DatabaseScheduler"
# development settings
CELERY_RESULT_PERSISTENT = False
CELERY_DEFAULT_DELIVERY_MODE = 'transient'`
i hope anyone can help me :)
Have you started celerycam?
python manage.py celerycam
It will take a snapshot (every 1 second by default) of the current state of tasks.
You can read more about it in the celery documentation