Whenever I use eventlet's monkey patching (necessary for Flask-SocketIO) the disk_monitor_thread() prevents other threads from firing. Eventlet and monkey patching is a must for me. Is there a way to get pyudev's MonitorObserver to place nice and release the GIL with monkey patching?
from threading import Thread
from pyudev import Context, Monitor, MonitorObserver
import eventlet
eventlet.monkey_patch()
def useless_thread():
while True:
print 'sleep thread 1'
time.sleep(2)
# Monitor UDEV for drive insertion / removal
def disk_monitor_thread():
context = Context()
monitor = Monitor.from_netlink(context)
monitor.filter_by('block')
def print_device_event(action, device):
if 'ID_FS_TYPE' in device and device.get('ID_FS_UUID') == '123-UUIDEXAMPLE':
print('{0}, {1}'.format(device.action, device.get('ID_FS_UUID')))
print 'Starting Disk Monitor...'
observer = MonitorObserver(monitor, print_device_event, name='monitor-observer')
print 'Disk Monitor Started'
observer.start()
t1 = Thread(name='uselessthread', target=useless_thread)
t1.start()
disk_monitor_thread()
results in:
sleep thread 1
Starting Disk Monitor...
and never moves forward from there
Related
I have a django application that is backed by a MySQL database. I have recently moved a section of code out of the request flow and put it into a Process. The code uses select_for_update() to lock affected rows in the DB but now I am occasionally seeing the Process updating a record while it should be locked in the main Thread. If I switch my Executor from a ProcessPoolExecutor to a ThreadPoolExecutor the locking works as expected. I thought that select_for_update() operated at the database level so it shouldn't make any difference whether code is in Threads, Processes, or even on another machine - what am I missing?
I've boiled my code down to a sample that exhibits the same behaviour:
from concurrent import futures
import logging
from time import sleep
from django.db import transaction
from myapp.main.models import CompoundBase
logger = logging.getLogger()
executor = futures.ProcessPoolExecutor()
# executor = futures.ThreadPoolExecutor()
def test() -> None:
pk = setup()
f1 = executor.submit(select_and_sleep, pk)
f2 = executor.submit(sleep_and_update, pk)
futures.wait([f1, f2])
def setup() -> int:
cb = CompoundBase.objects.first()
cb.corporate_id = 'foo'
cb.save()
return cb.pk
def select_and_sleep(pk: int) -> None:
try:
with transaction.atomic():
cb = CompoundBase.objects.select_for_update().get(pk=pk)
print('Locking')
sleep(5)
cb.corporate_id = 'baz'
cb.save()
print('Updated after sleep')
except Exception:
logger.exception('select_and_sleep')
def sleep_and_update(pk: int) -> None:
try:
sleep(2)
print('Updating')
with transaction.atomic():
cb = CompoundBase.objects.select_for_update().get(pk=pk)
cb.corporate_id = 'bar'
cb.save()
print('Updated without sleep')
except Exception:
logger.exception('sleep_and_update')
test()
When run as shown I get:
Locking
Updating
Updated without sleep
Updated after sleep
But if I change to the ThreadPoolExecutor I get:
Locking
Updating
Updated after sleep
Updated without sleep
The good news is that it's mostly there, I did some reading around and based on an answer I found here
I am assuming that you are running on Linux as that seems to be the behaviour on the platform.
It looks like under Linux the default Process start strategy is the fork strategy, which is usually what you want, however in this exact circumstance it appears that resources (such as DB connections) are being shared, resulting in the db operations being treated as the same transaction and thus are not blocked. To get the behaviour you want, each process would appear to need its own resources and to not share resouces with its parent process (and subsequently any other children of the parent).
It is possible to get the behaviour you want using the following code, however be aware that I had to split the code into two files.
fn.py
from time import sleep
from django.db import transaction
import django
django.setup()
from myapp.main.models import CompoundBase
def setup() -> int:
cb = CompoundBase.objects.first()
cb.corporate_id = 'foo'
cb.save()
return cb.pk
def select_and_sleep(pk: int) -> None:
try:
with transaction.atomic():
cb = CompoundBase.objects.select_for_update().get(pk=pk)
print('Locking')
sleep(5)
cb.corporate_id = 'baz'
cb.save()
print('Updated after sleep')
except Exception:
logger.exception('select_and_sleep')
def sleep_and_update(pk: int) -> None:
try:
sleep(2)
print('Updating')
with transaction.atomic():
cb = CompoundBase.objects.select_for_update().get(pk=pk)
cb.corporate_id = 'bar'
cb.save()
print('Updated without sleep')
except Exception:
logger.exception('sleep_and_update')
proc_test.py
from concurrent import futures
from multiprocessing import get_context
from time import sleep
import logging
import fn
logger = logging.getLogger()
executor = futures.ProcessPoolExecutor(mp_context=get_context("forkserver"))
# executor = futures.ThreadPoolExecutor()
def test() -> None:
pk = fn.setup()
f1 = executor.submit(fn.select_and_sleep, pk)
f2 = executor.submit(fn.sleep_and_update, pk)
futures.wait([f1, f2])
test()
There are three strategies in starting a process, fork, spawn, and forkserver, using either spawn or forkserver appears to get you the behaviour that you are looking for.
References:
Multiprocessing Locks
Connections
I have a Custom operator that inherits baseoperator. I am trying to templatize 'queue' name so that task can be picked up by different Celery worker.
But it uses raw template string (un rendered jinja string) as the queue name instead of rendered string.
The same flow works if I give the intended queue name directly as a simple string.
from airflow import DAG
from operators.check_operator import CheckQueueOperator
from datetime import datetime, timedelta
from airflow.operators.python_operator import BranchPythonOperator
from airflow.utils.dates import days_ago
default_args = {
'schedule_interval': None, # exclusively “externally triggered” DAG
'owner': 'admin',
'description': 'This helps to quickly check queue templatization',
'start_date': days_ago(1),
'retries': 0,
'retry_delay': timedelta(minutes=5),
'provide_context': True
}
# this goes to wrong queue --> {{ dag_run.conf["queue"]}}
with DAG('test_queue', default_args=default_args) as dag:
t1 = CheckQueueOperator(task_id='check_q',
queue='{{ dag_run.conf["queue"]}}'
)
In the above this scenario :
In RabbitMQ, I see the task being queued under queue name '{{ dag_run.conf["queue"]}}' (raw template string )
In Airflow, under Rendered template I am able to see properly rendered value for queue field
In the screenshot, we see docker-desktop as queue name. It's my test queue and also my default airflow queue. It works perfectly if I give this queue name as a direct string.
#this goes to right queue --> my_target_queue
with DAG('test_queue', default_args=default_args) as dag:
t1 = CheckQueueOperator(task_id='check_q',
queue='my_target_queue'
)
CheckQueueOperator code :
from airflow.models.baseoperator import BaseOperator
from airflow.models import BaseOperator
from airflow.utils.decorators import apply_defaults
'''
Validate if queue can be templatized in base operator
'''
class CheckQueueOperator(BaseOperator):
template_fields = ['queue']
#apply_defaults
def __init__(
self,
*args,
**kwargs
):
super(CheckQueueOperator, self).__init__(*args, **kwargs)
def execute(self, context):
self.log.info('*******************************')
self.log.info('Queue name %s', self.queue)
return
Stack details:
Apache Airflow version - 1.10.12
Using CeleryExecutor
Using RabbitMQ
The queue attribute is reserved (perhaps not officially, but in practice) by the BaseOperator and while you may be able to hoodwink the webserver into rendering the attribute, the parts of Airflow that handle scheduling and task execution don't perform rendering prior to reading the queue attribute.
I am using django and celery. I have a long running celery task and I would like it to report progress. I am doing this:
#shared_task
def do_the_job(tracker_id, *args, **kwargs):
while condition:
#Do a long operation
tracker = ProgressTracker.objects.get(pk=tracker_id)
tracker.task_progress = F('task_progress') + 1
tracker.last_update = timezone.now()
tracker.save(update_fields=['task_progress', 'last_update'])
The problem is that the view that is supposed to show the progress to the user cannot see the updates until the task finishes. Is there a way to get the django orm to ignore transactions for just this one table? Or just this one write?
You can use bound tasks to define custom states for your tasks and set/update the state during execution:
#celery.task(bind=True)
def show_progress(self, n):
for i in range(n):
self.update_state(state='PROGRESS', meta={'current': i, 'total': n})
You can dump the state of currently executing tasks to get the progress:
>>> from celery import Celery
>>> app = Celery('proj')
>>> i = app.control.inspect()
>>> i.active()
I am working on an autonomous car implementation for a web browser game with Python 2x. I use Tornado Web Server to run game on localhost and I post and receive data from game with JSON data format in the function called "FrameHandler" and also I determine what the act of car should be in "to_dict_faster()" function.
Here, my problem is that I can write data to text file which is hold in speed_data variable in specific time interval with help of a coroutine. However, I can't dump JSON data to function in this specific time interval because "FrameHandler" acts like While True and it always requests data to dump. What I am trying to do is sending desired acts as writing text file in specific time interval while not changing flow frame handler because it affects FPS of the game.
I am trying to figure out How can I do that for a long time any help would be great here:
#gen.coroutine
def sampler():
io_loop = tornado.ioloop.IOLoop.current()
start = time.time()
while True:
with open("Sampled_Speed.txt", "a") as text_file:
text_file.write("%d,%.2f\n" % (speed_data, ((time.time() - start))))
yield gen.Task(io_loop.add_timeout, io_loop.time() + period)
class MainHandler(tornado.web.RequestHandler):
def get(self):
self.redirect("/static/v2.curves.html")
class FrameHandler(tornado.web.RequestHandler):
def post(self):
global speed_data
data = json.loads(self.get_arguments("telemetry")[0])
ar = np.fromstring(base64.decodestring(self.request.body), dtype=np.uint8)
image = ar.reshape(hp.INPUT_SIZE, hp.INPUT_SIZE, hp.NUM_CHANNELS)
left, right, faster, slower = data["action"]
terminal, action, all_data, was_start = (
data["terminal"],
Action(left=left, right=right, faster=faster, slower=slower),
data["all_data"],
data["was_start"]
)
for i in range(len(all_data)):
data_dict=all_data[i]
speed_data = data_dict[u'speed']
position_data=data_dict[u'position']
result_action = agent.steps(image, 0.1, terminal, was_start, action, all_data)
if speed_data < 4000:
self.write(json.dumps(result_action.to_dict_faster()))
else:
self.write(json.dumps(result_action.to_dict_constant()))
def make_app():
return tornado.web.Application([
(r"/", MainHandler),
(r"/frame", FrameHandler),
(r"/static/(.*)", tornado.web.StaticFileHandler, {"path": static_path})
], debug=True)
if __name__ == "__main__":
app = make_app()
if "SERVER_PORT" in os.environ:
port = int(os.environ["SERVER_PORT"])
else:
port = 8880
print "LISTENING ON PORT: %d" % port
app.listen(port)
tornado.ioloop.IOLoop.current().run_sync(sampler)
tornado.ioloop.IOLoop.current().start()
You can move file writing to a different thread (using tornado's run_on_executor for example), so python interpreter will automatically switch from Sampler to main thread with FrameHandler on write. But you have to use thread-safe speed_data variable, I've used stdlib Queue.Queue as an example:
class Handler(tornado.web.RequestHandler):
#gen.coroutine
def get(self):
global speed_data
speed_data.put("REALLY BIG TEST DATA\n")
self.finish("OK")
class Sampler():
executor = concurrent.futures.ThreadPoolExecutor(max_workers=1)
def __init__(self, queue):
self._q = queue
#run_on_executor
def write_sample(self):
with open("foobar.txt", "w") as f:
while True:
data = self._q.get()
f.write(data)
if __name__ == '__main__':
application = Application(
[("/status", Handler)]
)
server = HTTPServer(application)
server.listen(8888)
speed_data = Queue.Queue()
smp = Sampler(speed_data)
IOLoop.current().add_callback(smp.write_sample)
IOLoop.current().start()
My WSGI application uses SQLAlchemy. I want to start session when request starts, commit it if it's dirty and request processing finished successfully, make rollback otherwise. So, I need to implement behavior of Django's TransactionMiddleware.
So, I suppose that I should create WSGI middleware and make following stuff:
Create and add DB session to environ on pre-processing.
Get DB session from environ and call commit() on post-processing, if no errors occurred.
Get DB session from environ and call rollback() on post-processing, if some errors occurred.
Step 1 is obvious for me:
class DbSessionMiddleware:
def __init__(self, app):
self.app = app
def __call__(self, environ, start_response):
environ['db_session'] = create_session()
return self.app(environ, start_response)
Step 2 and 3 - not. I found the example of post-processing task:
class Caseless:
def __init__(self, app):
self.app = app
def __call__(self, environ, start_response):
for chunk in self.app(environ, start_response):
yield chunk.lower()
It contains comment:
Note that the __call__ function is a Python generator, which is typical for this sort of “post-processing” task.
Could you please clarify how does it work and how can I solve my issue similarly.
Thanks,
Boris.
For step 1 I use SQLAlchemy scoped sessions:
engine = create_engine(settings.DB_URL, echo=settings.DEBUG, client_encoding='utf8')
Base = declarative_base()
sm = sessionmaker(bind=engine)
get_session = scoped_session(sm)
They return the same thread-local session for each get_session() call.
Step 2 and 3 for now is following:
class DbSessionMiddleware:
def __init__(self, app):
self.app = app
def __call__(self, environ, start_response):
try:
db.get_session().begin_nested()
return self.app(environ, start_response)
except BaseException:
db.get_session().rollback()
raise
finally:
db.get_session().commit()
As you can see, I start nested transaction on session to be able to rollback even queries that were already committed in views.