Locks in SQLAlchemy - sqlalchemy

I am using SQLAlchemy to do transactions into PosgreSQL db using Python application.
I don't know how to use locks in SQLAlchemy.
Anybody can help me to use locks in SQLAlchemy.
I am facing a problem as follows,
While running two instances of a an application parallel, it tries to insert rows into a same table. Sometimes I am getting error of duplicate primary key. Can I resolve this using lock concept?
Best Regards,
Suji

In order to explicitly use locks you can utilize the engine or the connection to issue raw SQL to the database:
engine.execute('LOCK TABLES tablename WRITE')
# do your stuff....
engine.execute('UNLOCK TABLES')

Related

Django & MariaDB/MySQL: Does select_for_update lock rows from subqueries? Causing deadlocks?

Software: Django 2.1.0, Python 3.7.1, MariaDB 10.3.8, Linux Ubuntu 18LTS
We recently added some load to a new application, and starting observing lots of deadlocks. After a lot digging, I found out that the Django select_for_update query resulted in an SQL with several subqueries (3 or 4). In all deadlocks I've seeen so far, at least one of the transactions involves this SQL with multiple subqueries.
my question is... Does the select_for_udpate lock records from every table involved? In my case, would record from the main SELECT, and from other tables used by subqueries get locked? Or only records from the main SELECT?
From Django docs:
By default, select_for_update() locks all rows that are selected by the query. For example, rows of related objects specified in select_related() are locked in addition to rows of the queryset’s model.
However, I'm not using select_related() , at least I don't put it explicitly.
Summary of my app:
with transaction.atomic():
ModelName.objects.select_for_update().filter(...)
...
update record that is locked
...
50+ clients sending queries to the database concurrently
Some of those queries ask for the same record. Meaning different transactions will run the same SQL at the same time.
After a lot of reading, I did the following to try to get the deadlock under control:
1- Try/Catch exception error '1213' (deadlock). When this happens, wait 30 seconds and retry the query. Here, I rely on the ROLLBACK function from the database engine.
Also, print output of SHOW ENGINE INNODB STATUS and SHOW PROCESSLIST. But SHOW PROCESSLIST doesn't give useful information.
2- Modify the Django select_on_update so that it doesn't build an SQL with subqueries. Now, the SQL generated contains a single WHERE with values and no subqueries.
Anything else that could be done to reduce the deadlocks?
If u hv select_for_update inside a transaction, it will only be released went the whole transaction commits or rollbacks. With nowait set to true the other concurrent requests will immediately fail with:
3572, 'Statement aborted because lock(s) could not be acquired immediately and NOWAIT is set.')
So if we cant use optimistic locks and cannot make transactions shorter, we can set nowait=true in our select_for_update, and we will see a lot of failures if our assumptions are correct. Here we can just catch deadlock failures and retry them with backoff strategy. This is based on the assumption that all people are trying to write to the same thing like an auction item, or ticket booking with a short window of time. If that is not the case consider changing the db design a bit to make deadlocks common

Mysql - detect transaction that inserts data

I'm trying to identify which transaction inserts data into a db table into a Mysql 5.1 server using innodb engine. Unfortunately I don't have access at the source code and thus I have to try to guess what's happening by looking at data on the DB.
It seems to me that a Dataset that is supposed to be written into the db in a single transaction is instead written into 2 separate transactions. I would like to test if this assumption is true or not.
In order to do that, my idea was to add a column into my table, let's say TransactionID, and than with a trigger copy the transactionId value on that column.
But I've found that seems not possible to detect in mysql 5.1 the TransactionId for innodb engine.
Do you know whether there's other options to identify the transaction involved into data insertion?

Nodejs and mysql duplicate issue

I'm using the mysql module for nodejs, and my program is running multiple workers with the cluster package.
Each worker get some tweets and store them in the database, but each record has to be unique by the tweet_id.
Now, I can see that a lot of duplicate records are present in my database, even if I'm checking for duplicates before inserting.
So, did you ever experienced this ? Is there a solution ?
Me a newbee but try to help.
As you have multiple workers
under default transaction isolation level
when worker A is writing to the db
worker B may have already wrote TWEET
but its invisible to A
one common solution is set a unique index to tweet_id column

Best way for multi-threaded inserts and updates on a mysql table with String primary key?

It seems like a simple question, but I am not able to figure out the best possible solution to this problem. I have a multithreaded as well as multiprocessing framework with reasonable concurrency(6-7 threads over 3 machines). Each process either inserts into or updates the same table which has a string field as the primary key. What will be the best possible way to ensure a thread-safe execution and avoid deadlocks ?
Any help will be greatly appreciated.
thanks,
Amit
You could append the queries into a Redis set from multiple threads. Then in another single thread to manage these queries into two (one insert and one update) and do the db operation from that thread. Then the deadlock can be avoided.

Entity Framework code first with mysql in Production

I am creating an asp.net *MVC* application using EF code first. I had used Sql azure as my database. But it turns out Sql Azure is not reliable. So I am thinking of using MySql/PostgreSQL for database.
I wanted to know the repercussions/implications of using EF code first with MySql/PostgreSQL in regards of performance.
Has anyone used this combo in production or knows anyone who has used it?
EDIT
I keep on getting following exceptions in Sql Azure.
SqlException: "*A transport-level error has occurred when receiving results from the server.*
(provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)"
SqlException: *"Database 'XXXXXXXXXXXXXXXX' on server 'XXXXXXXXXXXXXXXX' is not
currently available. Please retry the connection later.* If the problem persists, contact
customer support, and provide them the session tracing ID of '4acac87a-bfbe-4ab1-bbb6c-4b81fb315da'.
Login failed for user 'XXXXXXXXXXXXXXXX'."
First your problem seems to be a network issue, perhaps with your ISP. You may want to look at getting a remote PostgreSQL or MySQL db I think you will run into the same problems.
Secondly comparing MySQL and PostgreSQL performance is relatively tricky. In general, MySQL is optimized for pkey lookups, and PostgreSQL is more generally optimized for complex use cases. This may be a bit low-level but....
MySQL InnoDB tables are basically btree indexes where the leaf note includes the table data. The primary key is the key of the index. If no primary key is provided, one will be created for you. This means two things:
select * from my_large_table will be slow as there is no support for a physical order scan.
Select * from my_large_table where secondary_index_value = 2 requires two index traversals sinc ethe secondary index an only refer to the primary key values.
In contrast a selection for a primary key value will be faster than on PostgreSQL because the index contains the data.
PostgreSQL by comparison stores information in an unordered way in a series of heap pages. The indexes are separate from the data. If you want to pull by primary key you scan the index, then read the data page in which the data is found, and then pull the data. In comparison, if you pull from a secondary index, this is not any slower. Additionally, the tables are structured such that sequential disk access is possible when doing a long select * from my_large_table will result in the operating system read-ahead cache being able to speed performance significantly.
In short, if your queries are simply joinless selection by primary key, then MySQL will give you better performance. If you have joins and such, PostgreSQL will do better.